question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HTTPResponse does not observe file-like object protocol if decode_content=True

See original GitHub issue

My understanding was that urllib3’s HTTPResponse is supposed to behave according to the file-like object protocol if preload_content=False. In particular, I would like to hand it over to another library that is able to read from a file-like object (e.g. lxml).

Unfortunately, the HTTPReponse does not appear to implement the protocol properly if decode_content=True. Test case:

import urllib3
print('urllib3 v{}'.format(urllib3.__version__))

http = urllib3.PoolManager()

response = http.request('GET', 'https://github.com',
                        headers={'Accept-Encoding': 'gzip'},
                        preload_content=False,
                        decode_content=False)
print('decode_content=False, read(1): expected 1 bytes, got {} bytes'.format(len(response.read(1))))
print('decode_content=False, read(500): expected 500 bytes, got {} bytes'.format(len(response.read(500))))

response = http.request('GET', 'https://github.com',
                        headers={'Accept-Encoding': 'gzip'},
                        preload_content=False,
                        decode_content=True)
print('decode_content=True, read(1): expected 1 bytes, got {} bytes'.format(len(response.read(1))))
print('decode_content=True, read(500): expected 500 bytes, got {} bytes'.format(len(response.read(500))))

Output:

urllib3 v1.9
decode_content=False, read(1): expected 1 bytes, got 1 bytes
decode_content=False, read(500): expected 500 bytes, got 500 bytes
decode_content=True, read(1): expected 1 bytes, got 0 bytes
decode_content=True, read(500): expected 500 bytes, got 1056 bytes

The problem with the first case is that I thought .read()=='' means that the stream is consumed and/or has been closed. So a library might read 50 bytes, get an empty stream back and assume it has read the entire ‘file’, which wouldn’t be the case with HTTPResponse.

Now you can argue that reading just one byte is unreasonable, but what’s the minimal chunk size that still works? It appears to be 84 bytes for github.com, but it differs for other hosts.

The problem with the second case is that the file-object protocol defines the first parameter of read() as “Read at most size bytes from the file”. HTTPResponse returns more however. lxml for example seems to have internal buffer overflows if read() returns more bytes than it requested.

I guess at the very least, the documentation should mention that HTTPResponse is not adhering to the file-like object protocol.

Issue Analytics

  • State:open
  • Created 9 years ago
  • Comments:33 (18 by maintainers)

github_iconTop GitHub Comments

0reactions
lbtcommented, May 29, 2021

Is this still active? It’s certainly still seems to be a problem when trying to move to urllib3 and the “Backwards-compatible with http.client.HTTPResponse” (quote from the docs) can’t be passed to libraries that took http.client.HTTPResponse because of this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Response and Decoders - urllib3 1.26.13 documentation
Returns underlying file descriptor if one exists. OSError is raised if the IO object does not use a file descriptor. Flush write buffers,...
Read more >
Requests Documentation - Read the Docs
If you set stream to True when making a request, Requests cannot release the connection back to the pool unless you consume all...
Read more >
python requests return file-like object for streaming
There's an attribute Response.raw, which is already a file-like object. resp = requests.get(url, stream=True) resp.raw # is what you need.
Read more >
Class: Net::HTTPResponse (Ruby 3.1.2)
Set to true automatically when the request did not contain an Accept-Encoding header from the user. http_version[R]. The HTTP version supported by ...
Read more >
Source code for azure.core.pipeline.transport._requests_basic
content def text(self, encoding=None): # type: (Optional[str]) -> str """Return the whole body as a string. If encoding is not provided, mostly rely...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found