Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UnicodeEncodeError: Unencoded UTF-8 unicode in ARSC._analyze

See original GitHub issue

The bug is here : https://github.com/androguard/androguard/blob/a18256203d7af751c8862b04b15b15c23225b54f/androguard/core/bytecodes/axml.py#L1017

language and region are <unicode> type but are being used as string. This raised the following error:


UnicodeEncodeError: 'ascii' codec can't encode character u'\xa4' in position 0: ordinal not in range(128)

u'\xa4' being some unicode region name.

Androguard Version: master
Python Version: 2.7.10
Operating System: Mac OS

Issue Analytics

State:
Created 5 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

reoxcommented, Dec 3, 2018

that would be very appreciated, thanks!

Are we’re looking for a more “universal” fix? Then we better off looking into ARSCResTableConfig.

yes and no. I read through the whole ARSC parser once and i think there is much which needs to be rewritten anyways. Then, the universal fix would be to decide which parts are actually strings and which are bytes. Unfortunately, python2 was very sloppy with bytes/strings conversions, thus there are many problems now, that androguard runs on py3. So please, do not chase the rabbit too long. If the fix works in both py3 and py2 its fine for now!

0reactions

reoxcommented, Jan 3, 2019

It looks like the fix did not work and made the resource parser unusable…

Top Results From Across the Web

Python 3.6 utf-8 UnicodeEncodeError - Stack Overflow

You need to specify the encoding when opening the output file, same as you did with the input file:

Unicode HOWTO — Python 3.11.1 documentation

A Unicode string is turned into a sequence of bytes that contains embedded zero bytes only where they represent the null character (U+0000)....

How to solve unicode encoding issues - Invivoo

How to solve unicode encoding issues ... This is because in UTF-8 Unicode encoding Western special characters are all double-byte encoded.

Unicode data - Django documentation

If your environment isn't configured correctly, you'll encounter UnicodeEncodeError exceptions when saving files with file names or content that contains non- ...

Solving Unicode Problems in Python 2.7 - Azavea

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 1: ordinal not in range(128) (Why is this so hard??)