question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unicode issues with tail plugin

See original GitHub issue

I’m using irssi to output log files from channels. I then use the tail plugin to parse entries from these files. The log files are encoded as UTF-8. Until recently this has worked perfectly, but now the plugin aborts with an error “Failed to decode file using utf-8. Check encoding” anytime a Unicode character appears in the log.

When further investigating, I saw that it is caused by an exception thrown in native_str_to_text():

try:
 line = native_str_to_text(line, encoding=encoding)
except UnicodeError:
 raise plugin.PluginError('Failed to decode file using %s. Check encoding.' % encoding)
if PY2:
  def native_str_to_text(string, **kwargs):
          if 'encoding' not in kwargs:
              kwargs['encoding'] = 'ascii'
          return string.decode(**kwargs)
  else:
      def native_str_to_text(string, **kwargs):
  return string

Since I’m running Python 2.7 the native_str_to_text() is basically just calling string.decode(). And because the error message contains our preferred encoding I think we can be fairly certain that we are actually providing the decode-method with the correct encoding string.

I extended the code so that it printed out the full exception and the result was the following output:

'ascii' codec can't encode character u'\u25e2' in position 21: ordinal not in range(128)

The strange thing about all this is that when I cloned the git repository and run the code from there instead of the installed package, everything works. So I think this is some locale/environment trouble. I’ve tried to recreate the virtualenv but it did not have any effect. I’ve also tried to set the LC_ALL, PYTHONIOENCODING, LANG to UTF-8 with no luck.

I was able to recreate the exception in some test code by explicitly setting PYTHONIOENCODING=ascii. So there’s definitely some issues with how python performs decoding in my environment.

However, what did fix the problem was to change the tail.py so that it opens the file in binary mode instead by simply changing.

- with open(filename, 'r') as file:
+ with open(filename, 'rb') as file:

I’m new to Python so I’m not confident this is a good solution because by looking how native_str_to_text() is defined it could cause problems with Python 3 users as I’ve heard that the unicode string management has changed between v2 and v3. An other solution that came in mind was to specify an encoding to the open() method of the file.

Config:

 taskX:
    tail:
      file: ~/.irssi/logfromchannel.log
      encoding: utf-8
      entry:
        title: nick:\s(.*?)\s:\shttp://.*
        url: nick:\s.*?\s:\s(http://.*)
      format:
        url: '%(url)s'
 (... and other settings for the task, not related to tail)

Log:

2016-07-06 10:05 CRITICAL plugin        taskX      Failed to decode file using utf-8. Check encoding.
2016-07-06 10:05 WARNING  task          taskX      Aborting task (plugin: tail)

Additional information:

  • Flexget Version: 2.1.6
  • Python Version: 2.7.10
  • Installation method: Standard installation from released package using virtualenv and pip
  • OS and version: openSUSE 13.1 Linux 3.12.57-44-default #1 SMP Wed Apr 6 09:18:15 UTC 2016 (9b4534f) x86_64 x86_64 x86_64 GNU/Linux

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
gazpachokingcommented, Jul 6, 2016

I think the proper fix is this: Switch to io.open, which works the same across python versions, and is the default implementation of open on python 3. We should make the default encoding utf-8 instead of ascii, which is almost always a more sane choice. We shouldn’t have to deal with line by line decoding anymore, and I don’t think we ever needed the native_str_to_text utility, as the types here should all by consistent across python versions already.

0reactions
regystrocommented, Jul 11, 2016

Same happens to exec plugin:

BUG: Unhandled error in plugin exec: 'ascii' codec can't encode character u'\xf1' in position 104: ordinal not in range(128)

It was working right recently, but now if the given path contains unicode (e. g. “ñ”) characters it crashes.

Running version 2.1.5 (unable to upgrade to latest due to sqlalchemy update fails) on Windows 8.1 x64 + Python 2.7.10

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to remove unicode in fluentd tail/s3 plugin - Stack Overflow
Try fluentd record_transformer filter plugin like this: <filter abc.**> @type record_transformer enable_ruby true <record> message ...
Read more >
How to get non-ASCI characters displayed correctly in logs?
Is anybody else having problems with UTF / non-ASCI characters not displaying correctly in duplicacy logs?
Read more >
13 Ways to Tail a Log File on Windows & Linux - Stackify
Check out the top tips and tools on how to tail a log file on Windows and Linux. ... Unicode support for nearly...
Read more >
Unicode Objects and Codecs — Python 3.11.1 documentation
The o argument has to be a Unicode object (not checked). Changed in version 3.3: This function is now inefficient – because in...
Read more >
Unicode Input - Julia Documentation
Code point(s) Character(s) Tab completion sequence(s) U+000A1 ¡ \exclamdown U+000A3 £ \sterling U+000A5 ¥ \yen
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found