question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Errors While Decoding Response Text Using mitmdump

See original GitHub issue

Problem Description

When I visit websites with Arabic characters using a small addon script with mitmdump, and extract the response text I get the following error:

Traceback (most recent call last): File "main.py", line 36, in response response_text = flow.response.text File "c:\users\evead-61\appdata\local\programs\python\python38\lib\site-packages\mitmproxy\net\http\message.py", line 232, in get_text return cast(str, encoding.decode(content, enc)) File "c:\users\evead-61\appdata\local\programs\python\python38\lib\site-packages\mitmproxy\net\http\encoding.py", line 76, in decode raise ValueError("{} when decoding {} with {}: {}".format( ValueError: UnicodeDecodeError when decoding b'GIF89a\x with 'UTF-8': UnicodeDecodeError('utf-8', b'GIF89a\x01\x00\x01\x00\xf0\x00\x00\x00\x00\x00\x00\x00\x00!\xf9\x04\x01\x00\x00\x00\x00,\x00\x00\x00\x00\x01\x00\x01\x00\x00\x02\x02D\x01\x00;

Steps to reproduce the behavior:

  1. Write a small addon that assigns the HTTPResponse flow response text from the “response()” method
  2. Assign flow.response.text to a variable
  3. Run using mitmdump -s main.py --anticomp (assuming your file is called main.py)
  4. You can try it on this website chouftv.ma

System Information

Paste the output of “mitmproxy --version” here. Mitmproxy: 6.0.2 Python: 3.8.7 OpenSSL: OpenSSL 1.1.1i 8 Dec 2020 Platform: Windows-10-10.0.17763-SP0

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
Prinzhorncommented, Jan 27, 2021

ouch oof owie my bytes

content-type: image/gif; charset=utf-8

this comes from the https://collector.githubapp.com/github/page_view tracking pixel

Selection_795

So I guess we need to be more intelligent when doing guess_encoding (_get_content_type_charset())? @mhils

I was using

def response(flow):
    print(flow.request.url)
    print(len(flow.response.text))

and visiting GitHub will cause (on master)

Addon error: Traceback (most recent call last):
  File "/home/alex/Projects/super-top-secret/src/forks/mitmproxy/mitmproxy/net/http/encoding.py", line 67, in decode
    decoded = custom_decode[encoding](encoded)
KeyError: 'utf-8'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/alex/Projects/super-top-secret/src/forks/mitmproxy/mitmproxy/net/http/encoding.py", line 69, in decode
    decoded = codecs.decode(encoded, encoding, errors)  # type: ignore
  File "/usr/lib/python3.8/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 10: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/alex/Projects/super-top-secret/src/issues/mitmproxy-4415/main.py", line 3, in response
    print(len(flow.response.text))
  File "/home/alex/Projects/super-top-secret/src/forks/mitmproxy/mitmproxy/net/http/message.py", line 244, in get_text
    return cast(str, encoding.decode(content, enc))
  File "/home/alex/Projects/super-top-secret/src/forks/mitmproxy/mitmproxy/net/http/encoding.py", line 76, in decode
    raise ValueError("{} when decoding {} with {}: {}".format(
ValueError: UnicodeDecodeError when decoding b'GIF89a\x with 'utf-8': UnicodeDecodeError('utf-8', b'GIF89a\x01\x00\x01\x00\x80\xff\x00\xff\xff\xff\x00\x00\x00,\x00\x00\x00\x00\x01\x00\x01\x00\x00\x02\x02D\x01\x00;', 10, 11, 'invalid start byte')
0reactions
Prinzhorncommented, Feb 27, 2021

Do a better job at guessing. TL;DR, this may fix some occasions, but doesn’t solve the problem. What’s the proper encoding of the tracking pixel above? “binary” is not a valid encoding.

I personally prefer this option since the problem cannot really be solved. Being able to replace stuff inside binary bodies is neat, e.g. search & replace meta data in images or pdfs. And I guess latin-1 gets that job done and keeping it for backwards compat is nice. I would try to expand our heuristics and add new special cases as we find them. I assume that’s basically what browser vendors do but by now they’ve seen 99.9999% of weird shit.

Now the fun begins. We can fall back to latin-1 for image/* but not for image/svg+xml. Same for audio/*, video/* and application/octet-stream.

If we can agree that this is a valid solution I’ll grab a list of common mime types and improve the guessing we currently have.

Read more comments on GitHub >

github_iconTop Results From Across the Web

mitmproxy installation by the python setuptools easy_install ...
When I try to install mitmproxy by the easy_install on windows 7, I got error with ascii. Here is the console result from...
Read more >
Keep encountering `Script error` when modifying response
Hi, I'm new to Python either mitmproxy, just keep encountering script error when using the below sample script: def response(flow): ...
Read more >
Intercepting responses using mitmproxy in Python. Problem in ...
Does it not go defeat the purpose of being Private? I've tried decrypting the data with AES key itself, but it gives the...
Read more >
selenium-wire - PyPI
You author your code in the same way as you do with Selenium, but you get extra APIs ... from seleniumwire.utils import decode...
Read more >
Datastructures - Introduction — mitmproxy 0.18 documentation
Note that it's possible for a Flow to have both a response and an error object. ... expression pattern with repl in both...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found