Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unicode issues with tail plugin

See original GitHub issue

I’m using irssi to output log files from channels. I then use the tail plugin to parse entries from these files. The log files are encoded as UTF-8. Until recently this has worked perfectly, but now the plugin aborts with an error “Failed to decode file using utf-8. Check encoding” anytime a Unicode character appears in the log.

When further investigating, I saw that it is caused by an exception thrown in native_str_to_text():

try:
 line = native_str_to_text(line, encoding=encoding)
except UnicodeError:
 raise plugin.PluginError('Failed to decode file using %s. Check encoding.' % encoding)

if PY2:
  def native_str_to_text(string, **kwargs):
          if 'encoding' not in kwargs:
              kwargs['encoding'] = 'ascii'
          return string.decode(**kwargs)
  else:
      def native_str_to_text(string, **kwargs):
  return string

Since I’m running Python 2.7 the native_str_to_text() is basically just calling string.decode(). And because the error message contains our preferred encoding I think we can be fairly certain that we are actually providing the decode-method with the correct encoding string.

I extended the code so that it printed out the full exception and the result was the following output:

'ascii' codec can't encode character u'\u25e2' in position 21: ordinal not in range(128)

The strange thing about all this is that when I cloned the git repository and run the code from there instead of the installed package, everything works. So I think this is some locale/environment trouble. I’ve tried to recreate the virtualenv but it did not have any effect. I’ve also tried to set the LC_ALL, PYTHONIOENCODING, LANG to UTF-8 with no luck.

I was able to recreate the exception in some test code by explicitly setting PYTHONIOENCODING=ascii. So there’s definitely some issues with how python performs decoding in my environment.

However, what did fix the problem was to change the tail.py so that it opens the file in binary mode instead by simply changing.

- with open(filename, 'r') as file:
+ with open(filename, 'rb') as file:

I’m new to Python so I’m not confident this is a good solution because by looking how native_str_to_text() is defined it could cause problems with Python 3 users as I’ve heard that the unicode string management has changed between v2 and v3. An other solution that came in mind was to specify an encoding to the open() method of the file.

Config:

 taskX:
    tail:
      file: ~/.irssi/logfromchannel.log
      encoding: utf-8
      entry:
        title: nick:\s(.*?)\s:\shttp://.*
        url: nick:\s.*?\s:\s(http://.*)
      format:
        url: '%(url)s'
 (... and other settings for the task, not related to tail)

Log:

2016-07-06 10:05 CRITICAL plugin        taskX      Failed to decode file using utf-8. Check encoding.
2016-07-06 10:05 WARNING  task          taskX      Aborting task (plugin: tail)

Additional information:

Flexget Version: 2.1.6
Python Version: 2.7.10
Installation method: Standard installation from released package using virtualenv and pip
OS and version: openSUSE 13.1 Linux 3.12.57-44-default #1 SMP Wed Apr 6 09:18:15 UTC 2016 (9b4534f) x86_64 x86_64 x86_64 GNU/Linux

Issue Analytics

State:
Created 7 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

gazpachokingcommented, Jul 6, 2016

I think the proper fix is this: Switch to io.open, which works the same across python versions, and is the default implementation of open on python 3. We should make the default encoding utf-8 instead of ascii, which is almost always a more sane choice. We shouldn’t have to deal with line by line decoding anymore, and I don’t think we ever needed the native_str_to_text utility, as the types here should all by consistent across python versions already.

0reactions

regystrocommented, Jul 11, 2016

Same happens to exec plugin:

BUG: Unhandled error in plugin exec: 'ascii' codec can't encode character u'\xf1' in position 104: ordinal not in range(128)

It was working right recently, but now if the given path contains unicode (e. g. “ñ”) characters it crashes.

Running version 2.1.5 (unable to upgrade to latest due to sqlalchemy update fails) on Windows 8.1 x64 + Python 2.7.10

Top Results From Across the Web

How to remove unicode in fluentd tail/s3 plugin - Stack Overflow

Try fluentd record_transformer filter plugin like this: <filter abc.**> @type record_transformer enable_ruby true <record> message ...

How to get non-ASCI characters displayed correctly in logs?

Is anybody else having problems with UTF / non-ASCI characters not displaying correctly in duplicacy logs?

13 Ways to Tail a Log File on Windows & Linux - Stackify

Check out the top tips and tools on how to tail a log file on Windows and Linux. ... Unicode support for nearly...

Unicode Objects and Codecs — Python 3.11.1 documentation

The o argument has to be a Unicode object (not checked). Changed in version 3.3: This function is now inefficient – because in...

Unicode Input - Julia Documentation

Code point(s) Character(s) Tab completion sequence(s) U+000A1 ¡ \exclamdown U+000A3 £ \sterling U+000A5 ¥ \yen

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Unicode issues with tail plugin

Config:

Log:

Additional information:

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Imdb movie title updates not propagating to movie_list

missing python-telegram-bot pkg