question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UnicodeDecodeError

See original GitHub issue

Issue Description

I have encountered the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd in position 1: invalid start byte thrown on line 236 of torrent_parser.py

Your tests might not have encountered this due to a small sample size, since I use your library to processes a thousands of torrents in an hour. Thats how I found it.

I have temporarily fixed this by adding the ignore flag on the byte string decode function like so.

string = raw.decode(encoding, "ignore")

It would be lovely if you could add this to the upstream directory.

A test file I used is attached: auratorrent.torrent.zip

The PR request is #5

This is covered here: https://docs.python.org/3/howto/unicode.html

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
7sDreamcommented, Jun 23, 2018

v0.3.0 just released.

In this version, there are many way to deal with this problem:

import torrent_parser as tp

file = 'tests/test_files/utf8.encoding.error.torrent'

# way 1

data = tp.parse_torrent_file(file, errors='ignore')
print(data['magnet-info']['info_hash'])

data = tp.parse_torrent_file(file, errors='replace')
print(data['magnet-info']['info_hash'])

# way 2

data = tp.parse_torrent_file(file, hash_fields={'info_hash': (20, False)})
print(data['magnet-info']['info_hash'])

# way 3

data = tp.parse_torrent_file(file, hash_fields={'info_hash': (20, False)}, hash_raw=True)
print(data['magnet-info']['info_hash'])

# If you don't use any above option

try:
    data = tp.parse_torrent_file(file)
except tp.InvalidTorrentDataException as e:
    print(e)

the output:

jysL
�j��y�sL�
36fd06b595119b380df46ab2f2a0b579b1734ca8
b'6\xfd\x06\xb5\x95\x11\x9b8\r\xf4j\xb2\xf2\xa0\xb5y\xb1sL\xa8'
Fail to decode string at pos 16436 using encoding utf-8 when parser field "info_hash", maybe it is an hash field. You can use self.hash_field("info_hash") to let it be treated as hash value, so this error may disappear

the hash_field("info_hash") is added to the class:

with open(file, 'rb') as f:
    data = tp.TorrentFileParser(f).hash_field('info_hash').parse()
    print(data['magnet-info']['info_hash'])
    # 36fd06b595119b380df46ab2f2a0b579b1734ca8

with open(file, 'rb') as f:
    data = tp.BDecoder(f.read()).hash_field('info_hash').decode()
    print(data['magnet-info']['info_hash'])
    # 36fd06b595119b380df46ab2f2a0b579b1734ca8
1reaction
7sDreamcommented, Jun 22, 2018

Thanks for your idea.

I will finish the customize hash fields API tomorrow and release a new version.

Due to the break change and so may thing be added, It will be 0.3.0.

(And yes, in 0.x.x break change don’t need add the major version… I’m still considering when to reach the 1.0 ⌛)

Read more comments on GitHub >

github_iconTop Results From Across the Web

UnicodeDecodeError - Python Wiki
The UnicodeDecodeError normally happens when decoding an str string from a certain coding. Since codings map only a limited number of str ...
Read more >
How to fix: "UnicodeDecodeError: 'ascii' codec can't decode ...
UnicodeDecodeError : 'ascii' codec can't decode byte generally happens when you try to convert a Python 2.x str that contains non-ASCII to a ......
Read more >
UnicodeDecodeError utf-8 codec can t decode byte in position ...
While I importing the file it shows UnicodeDecodeError: "utf-8" codec can"t decode byte 0xa0 in position ... as pd a ...
Read more >
How to resolve a UnicodeDecodeError for a CSV file - Kaggle
_string_box_utf8() UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 1: invalid continuation byte During handling of the above exception, ...
Read more >
'charmap' codec can't decode byte 0x81 in position X ... - GitHub
UnicodeDecodeError : 'charmap' codec can't decode byte 0x81 in position X: character maps to <undefined> #770. Closed. guilhermeferrari opened this issue on Apr ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found