question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UnicodeDecodeError if using AWS ElasticSearch cluster

See original GitHub issue

Issue Summary

Doesn’t work with provided django settings. With alternative settings (from Wagtail issue 2776, getting UnicodeDecodeError if any special character is present in Page.title (or any other indexable Page field I believe).

If using with AWS ElasticSearch service, first issue I came across is that default django settings from Wagtail documentation don’t work if signing the request with AWS4Auth (recommended by AWS).

I found the solution that works in Wagtail issue 2776. The very last comment by @justinoue.

But assuming I have a character like u'C\xe9line' (which is u'Céline') as Page.title, update_index breaks with UnicodeDecodeError.

Steps to Reproduce

  1. Create IAM user, give it AmazonESFullAccess - AWS Managed policy permission.
  2. Spin up AWS ElasticSearch instance with Access Policy allowing access to IAM user above.
  3. Have any of the Page instances title set to u'C\xc3\xa9line'.
  4. Use settings from Wagtail documentation:
from elasticsearch import RequestsHttpConnection
from requests_aws4auth import AWS4Auth

AWS_ELASTICSEARCH_ACCESS_KEY_ID = '<YOUR_ACCESS_ID>'
AWS_ELASTICSEARCH_SECRET_ACCESS_KEY = '<YOUR_SECRET_ACCESS_KEY>'

WAGTAILSEARCH_BACKENDS = {
     'default': {
         'BACKEND': 'wagtail.wagtailsearch.backends.elasticsearch2',
         'URLS': ['https://<AWS_ES_ENDPOINT>'],
         'INDEX': 'wagtail',
         'TIMEOUT': 5,
         'OPTIONS': {
             'connection_class': RequestsHttpConnection,
         },
         'INDEX_SETTINGS': {},
         'port': 443,
         'use_ssl': True,
         'verify_certs': True,
         'http_auth': AWS4Auth(AWS_ELASTICSEARCH_ACCESS_KEY_ID, AWS_ELASTICSEARCH_SECRET_ACCESS_KEY, '<AWS_ES_REGION>', 'es'),
     }
 }
  1. ^^ This is going to break as it doesn’t sign the requests and thinks you’re trying to do this as anonymous user. The error looks like this:
elasticsearch.exceptions.AuthorizationException: TransportError(403, u'{"Message":"User: anonymous is not authorized to perform: es:ESHttpDelete on resource: am-dev"}')
  1. Change the settings to match the example from Wagtail issue 2776.
from elasticsearch import RequestsHttpConnection
from requests_aws4auth import AWS4Auth

AWS_ELASTICSEARCH_ACCESS_KEY_ID = '<YOUR_ACCESS_ID>'
AWS_ELASTICSEARCH_SECRET_ACCESS_KEY = '<YOUR_SECRET_ACCESS_KEY>'

WAGTAILSEARCH_BACKENDS = {
    'default': {
        'BACKEND': 'wagtail.wagtailsearch.backends.elasticsearch2',
        'INDEX': 'wagtail',
        'TIMEOUT': 5,
        'HOSTS': [{
            'host': '<AWS_ES_ENDPOINT>'',
            'port': 443,
            'use_ssl': True,
            'verify_certs': True,
            'http_auth': AWS4Auth(AWS_ELASTICSEARCH_ACCESS_KEY_ID, AWS_ELASTICSEARCH_SECRET_ACCESS_KEY, '<AWS_ES_REGION>', 'es'),
        }],
        'connection_class': RequestsHttpConnection
    }
}
  1. Run ./manage.py update_index --verbosity=3
  2. You will get this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 17345: ordinal not in range(128)

Full traceback is here.

Notes

  1. position 17345 where it breaks is the index of special character.
In [1]: message_body[17344:17351]
Out[1]: 'C\xc3\xa9line'

or specifically:

In [2]: message_body[17345]
Out[2]: '\xc3'
  1. The settings in step 6 are not documented. Note how it uses hosts instead of URLS etc.
  2. Interesting that the issue ultimately traces down to python’s httplib.py:880 where it tries to do
msg += message_body

where msg is unicode but message_body is string containing our special character.

See the gist with variables msg and message_body. Note, some parameters from msg (AWS_ES_ENDPOINT, SHA, AWS_ELASTICSEARCH_ACCESS_KEY_ID, SIGNATURE) are hidden since they contain sensitive information.

Technical details

  • Python version: Python 2.7.13
  • Django version: Django 1.9.6
  • Wagtail version: Wagtail 1.8.1

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:1
  • Comments:11 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
tomdysoncommented, Mar 7, 2017

p.s. @StriveForBest thank you for your exemplary issue report!

0reactions
striveforbestcommented, Jan 12, 2018

@gasman, makes sense. I will try a similar setup with Elasticsearch 5 and latest Wagtail soon and will open another issue if necessary. I think it’s important to maintain Python 2 support.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Django Elasticsearch AWS httplib UnicodeDecodeError
I finally manage to fix the bug by using a custom serializer : from elasticsearch import Elasticsearch, RequestsHttpConnection, serializer, ...
Read more >
Troubleshooting Amazon OpenSearch Service
Reconfiguring a domain with a red cluster status can compound the problem and lead to the domain being stuck in a configuration state...
Read more >
Resolve search or write rejections in Amazon OpenSearch ...
This bulk queue error occurs when the number of requests to the cluster exceeds the bulk queue size (threadpool.bulk.queue_size). A bulk queue ...
Read more >
Amazon OpenSearch Service cluster is in red or yellow status
If your cluster status shows yellow status, then the primary ... For more information, see Coping with failure on the Elasticsearch website.
Read more >
Django Elasticsearch AWS httplib UnicodeDecodeError-django
So, sys.setdefaultencoding(..) is not available in the sys module's namespace. This is solved by calling the reload function on sys . Now, prior...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found