question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UTF-8 serialization in python 2

See original GitHub issue

I’m running python 2.7 connecting to AWS elasticsearch service using the 2.2 release of elasticsearch-py. To connect I use requests_aws4auth as recommended in your docs (thanks for integrating that!).

When writing to elasticsearch (bulk upload, creating a doc etc) I get this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2102: ordinal not in range(128)

I know that the library was changed a couple months ago to stop hiding unicode errors but this change coincides with the introduction of support for requests_aws4auth as they are both first seen in the 2.2 release and therefore downgrading is not an option for me. Handling unicode conversion myself piecemeal is non-trivial. Upgrading to python 3 is not an option yet given other dependencies.

Therefore, I have come up with a workaround for now using a custom serializer that essentially reverts the unicode change made earlier to this codebase:

from elasticsearch import Elasticsearch, RequestsHttpConnection, serializer, compat, exceptions

class JSONSerializerPython2(serializer.JSONSerializer):
    """Override elasticsearch library serializer to ensure it encodes utf characters during json dump.
    See original at: https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch/serializer.py#L42
    A description of how ensure_ascii encodes unicode characters to ensure they can be sent across the wire
    as ascii can be found here: https://docs.python.org/2/library/json.html#basic-usage
    """
    def dumps(self, data):
        # don't serialize strings
        if isinstance(data, compat.string_types):
            return data
        try:
            return json.dumps(data, default=self.default, ensure_ascii=True)
        except (ValueError, TypeError) as e:
            raise exceptions.SerializationError(data, e)

I hope this helps anyone else that runs into this issue.

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Reactions:19
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

16reactions
honzakralcommented, May 12, 2016

@vinitkumar you should never need to fork the repo - you can pass in your own serializer very simply:

from elasticsearch import Elasticsearch
es = Elasticsearch(..., serializer=JSONSerializerPython2())
0reactions
honzakralcommented, Jul 15, 2016

@LucasBerbesson you’d have to convert everything to unicode before passing the data into super(), otherwise you’d get the same issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

python 2.7 - how to print with utf-8 characters? - Stack Overflow
When using unicode, it has to be serialized or encoded to bytes before writing to files. You have bytes but you try to...
Read more >
Building a Python 2/3 compatible Unicode Sandwich
The best solution is to use Unicode everywhere in Python 2, importing from builtins import str (as recommended above) and then using isinstance( ......
Read more >
Solving Unicode Problems in Python 2.7 - Azavea
If you've just run into the Python 2 Unicode brick wall, here are ... UTF-8, UTF-16, and UTF-32 are serialization formats — NOT...
Read more >
pickle — Python object serialization — Python 3.11.1 ...
JSON is a text serialization format (it outputs unicode text, although most of the time it is then encoded to utf-8 ), while...
Read more >
Serializing Python Objects - Dive Into Python 3
To convert a list of integers back into a bytes object, you can use the bytes() function. That was it; there were only...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found