Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UnicodeDecodeError

See original GitHub issue

I have encountered the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd in position 1: invalid start byte thrown on line 236 of torrent_parser.py

Your tests might not have encountered this due to a small sample size, since I use your library to processes a thousands of torrents in an hour. Thats how I found it.

I have temporarily fixed this by adding the ignore flag on the byte string decode function like so.

string = raw.decode(encoding, "ignore")

It would be lovely if you could add this to the upstream directory.

A test file I used is attached: auratorrent.torrent.zip

The PR request is #5

This is covered here: https://docs.python.org/3/howto/unicode.html

Issue Analytics

State:
Created 5 years ago
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

7sDreamcommented, Jun 23, 2018

v0.3.0 just released.

In this version, there are many way to deal with this problem:

import torrent_parser as tp

file = 'tests/test_files/utf8.encoding.error.torrent'

# way 1

data = tp.parse_torrent_file(file, errors='ignore')
print(data['magnet-info']['info_hash'])

data = tp.parse_torrent_file(file, errors='replace')
print(data['magnet-info']['info_hash'])

# way 2

data = tp.parse_torrent_file(file, hash_fields={'info_hash': (20, False)})
print(data['magnet-info']['info_hash'])

# way 3

data = tp.parse_torrent_file(file, hash_fields={'info_hash': (20, False)}, hash_raw=True)
print(data['magnet-info']['info_hash'])

# If you don't use any above option

try:
    data = tp.parse_torrent_file(file)
except tp.InvalidTorrentDataException as e:
    print(e)

the output:

jysL
�j��y�sL�
36fd06b595119b380df46ab2f2a0b579b1734ca8
b'6\xfd\x06\xb5\x95\x11\x9b8\r\xf4j\xb2\xf2\xa0\xb5y\xb1sL\xa8'
Fail to decode string at pos 16436 using encoding utf-8 when parser field "info_hash", maybe it is an hash field. You can use self.hash_field("info_hash") to let it be treated as hash value, so this error may disappear

the hash_field("info_hash") is added to the class:

with open(file, 'rb') as f:
    data = tp.TorrentFileParser(f).hash_field('info_hash').parse()
    print(data['magnet-info']['info_hash'])
    # 36fd06b595119b380df46ab2f2a0b579b1734ca8

with open(file, 'rb') as f:
    data = tp.BDecoder(f.read()).hash_field('info_hash').decode()
    print(data['magnet-info']['info_hash'])
    # 36fd06b595119b380df46ab2f2a0b579b1734ca8

1reaction

7sDreamcommented, Jun 22, 2018

Thanks for your idea.

I will finish the customize hash fields API tomorrow and release a new version.

Due to the break change and so may thing be added, It will be 0.3.0.

(And yes, in 0.x.x break change don’t need add the major version… I’m still considering when to reach the 1.0 ⌛)

Top Results From Across the Web

UnicodeDecodeError - Python Wiki

The UnicodeDecodeError normally happens when decoding an str string from a certain coding. Since codings map only a limited number of str ...

How to fix: "UnicodeDecodeError: 'ascii' codec can't decode ...

UnicodeDecodeError : 'ascii' codec can't decode byte generally happens when you try to convert a Python 2.x str that contains non-ASCII to a ......

UnicodeDecodeError utf-8 codec can t decode byte in position ...

While I importing the file it shows UnicodeDecodeError: "utf-8" codec can"t decode byte 0xa0 in position ... as pd a ...

How to resolve a UnicodeDecodeError for a CSV file - Kaggle

_string_box_utf8() UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 1: invalid continuation byte During handling of the above exception, ...

'charmap' codec can't decode byte 0x81 in position X ... - GitHub

UnicodeDecodeError : 'charmap' codec can't decode byte 0x81 in position X: character maps to <undefined> #770. Closed. guilhermeferrari opened this issue on Apr ......