Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HTTPResponse does not observe file-like object protocol if decode_content=True

See original GitHub issue

My understanding was that urllib3’s HTTPResponse is supposed to behave according to the file-like object protocol if preload_content=False. In particular, I would like to hand it over to another library that is able to read from a file-like object (e.g. lxml).

Unfortunately, the HTTPReponse does not appear to implement the protocol properly if decode_content=True. Test case:

import urllib3
print('urllib3 v{}'.format(urllib3.__version__))

http = urllib3.PoolManager()

response = http.request('GET', 'https://github.com',
                        headers={'Accept-Encoding': 'gzip'},
                        preload_content=False,
                        decode_content=False)
print('decode_content=False, read(1): expected 1 bytes, got {} bytes'.format(len(response.read(1))))
print('decode_content=False, read(500): expected 500 bytes, got {} bytes'.format(len(response.read(500))))

response = http.request('GET', 'https://github.com',
                        headers={'Accept-Encoding': 'gzip'},
                        preload_content=False,
                        decode_content=True)
print('decode_content=True, read(1): expected 1 bytes, got {} bytes'.format(len(response.read(1))))
print('decode_content=True, read(500): expected 500 bytes, got {} bytes'.format(len(response.read(500))))

Output:

urllib3 v1.9
decode_content=False, read(1): expected 1 bytes, got 1 bytes
decode_content=False, read(500): expected 500 bytes, got 500 bytes
decode_content=True, read(1): expected 1 bytes, got 0 bytes
decode_content=True, read(500): expected 500 bytes, got 1056 bytes

The problem with the first case is that I thought .read()=='' means that the stream is consumed and/or has been closed. So a library might read 50 bytes, get an empty stream back and assume it has read the entire ‘file’, which wouldn’t be the case with HTTPResponse.

Now you can argue that reading just one byte is unreasonable, but what’s the minimal chunk size that still works? It appears to be 84 bytes for github.com, but it differs for other hosts.

The problem with the second case is that the file-object protocol defines the first parameter of read() as “Read at most size bytes from the file”. HTTPResponse returns more however. lxml for example seems to have internal buffer overflows if read() returns more bytes than it requested.

I guess at the very least, the documentation should mention that HTTPResponse is not adhering to the file-like object protocol.

Issue Analytics

State:
Created 9 years ago
Comments:33 (18 by maintainers)

Top GitHub Comments

1reaction

sethmlarsoncommented, May 29, 2021

@lbt see https://github.com/urllib3/urllib3/issues/2128

0reactions

lbtcommented, May 29, 2021

Is this still active? It’s certainly still seems to be a problem when trying to move to urllib3 and the “Backwards-compatible with http.client.HTTPResponse” (quote from the docs) can’t be passed to libraries that took http.client.HTTPResponse because of this issue.

Top Results From Across the Web

Response and Decoders - urllib3 1.26.13 documentation

Returns underlying file descriptor if one exists. OSError is raised if the IO object does not use a file descriptor. Flush write buffers,...

Requests Documentation - Read the Docs

If you set stream to True when making a request, Requests cannot release the connection back to the pool unless you consume all...

python requests return file-like object for streaming

There's an attribute Response.raw, which is already a file-like object. resp = requests.get(url, stream=True) resp.raw # is what you need.

Class: Net::HTTPResponse (Ruby 3.1.2)

Set to true automatically when the request did not contain an Accept-Encoding header from the user. http_version[R]. The HTTP version supported by ...

Source code for azure.core.pipeline.transport._requests_basic

content def text(self, encoding=None): # type: (Optional[str]) -> str """Return the whole body as a string. If encoding is not provided, mostly rely...