HTTPResponse does not observe file-like object protocol if decode_content=True
See original GitHub issueMy understanding was that urllib3’s HTTPResponse is supposed to behave according to the file-like object protocol if preload_content=False. In particular, I would like to hand it over to another library that is able to read from a file-like object (e.g. lxml).
Unfortunately, the HTTPReponse does not appear to implement the protocol properly if decode_content=True. Test case:
import urllib3
print('urllib3 v{}'.format(urllib3.__version__))
http = urllib3.PoolManager()
response = http.request('GET', 'https://github.com',
headers={'Accept-Encoding': 'gzip'},
preload_content=False,
decode_content=False)
print('decode_content=False, read(1): expected 1 bytes, got {} bytes'.format(len(response.read(1))))
print('decode_content=False, read(500): expected 500 bytes, got {} bytes'.format(len(response.read(500))))
response = http.request('GET', 'https://github.com',
headers={'Accept-Encoding': 'gzip'},
preload_content=False,
decode_content=True)
print('decode_content=True, read(1): expected 1 bytes, got {} bytes'.format(len(response.read(1))))
print('decode_content=True, read(500): expected 500 bytes, got {} bytes'.format(len(response.read(500))))
Output:
urllib3 v1.9
decode_content=False, read(1): expected 1 bytes, got 1 bytes
decode_content=False, read(500): expected 500 bytes, got 500 bytes
decode_content=True, read(1): expected 1 bytes, got 0 bytes
decode_content=True, read(500): expected 500 bytes, got 1056 bytes
The problem with the first case is that I thought .read()=='' means that the stream is consumed and/or has been closed. So a library might read 50 bytes, get an empty stream back and assume it has read the entire ‘file’, which wouldn’t be the case with HTTPResponse.
Now you can argue that reading just one byte is unreasonable, but what’s the minimal chunk size that still works? It appears to be 84 bytes for github.com, but it differs for other hosts.
The problem with the second case is that the file-object protocol defines the first parameter of read() as “Read at most size bytes from the file”. HTTPResponse returns more however. lxml for example seems to have internal buffer overflows if read() returns more bytes than it requested.
I guess at the very least, the documentation should mention that HTTPResponse is not adhering to the file-like object protocol.
Issue Analytics
- State:
- Created 9 years ago
- Comments:33 (18 by maintainers)

Top Related StackOverflow Question
@lbt see https://github.com/urllib3/urllib3/issues/2128
Is this still active? It’s certainly still seems to be a problem when trying to move to urllib3 and the “Backwards-compatible with http.client.HTTPResponse” (quote from the docs) can’t be passed to libraries that took http.client.HTTPResponse because of this issue.