question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HTTPResponse.read() closes the response causing problem with io.BufferedReader

See original GitHub issue

In the following text I mostly refer to python2 for clarity, but the problem also happens on py3. From some library I get an active streaming urllib3.HTTPResponse with preload_content=False. That response refers to a very large CSV file, so I want to read it line-by-line passing it through csv.reader. csv.reader requires unicode objects. HTTPResponse returns bytes objects, so I use io.TextIOWrapper. (Actually I use backports.csv to achieve compatibility with io).

Now, here is the problem:

test.csv - notice the last line is missing a trailing line separator:

abc<LF>
def

It is served with python -m SimpleHTTPServer. And the test code, even without CSV:

import io
import urllib3

http = urllib3.PoolManager()
resp = http.request('GET', 'http://localhost:8000/test.csv', preload_content=False)

for line in iter(
	io.TextIOWrapper(
		# py2's implementation of TextIOWrapper requires `read1` method which is provided by `BufferedReader` wrapper
		io.BufferedReader(
			resp
		)
	)
):
	print(repr(line))

Here is the error:

u'abc\n'
Traceback (most recent call last):
  File "test.py", line 11, in <module>
    resp
ValueError: I/O operation on closed file.

If I remove TextIOWrapper then the error is slightly different but with the same meaning:

ValueError: readline of closed file

From what I could find out, C implementation of readline checks if the file is closed. And if (and only if) the last line is not terminated then it tries to read “the rest” which results in this error, because HTTPResponse.read() closes the “file” after it is finished.

This behaviour is contrary to ordinary file behaviour (and io.open() or py3 file behaviour) which will not close the file on its will but instead will keep yielding empty string when trying to read past the end of file.

Which would be the best way to make streaming HTTPResponse work properly with io.BufferedReader?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:15 (12 by maintainers)

github_iconTop GitHub Comments

7reactions
haikugingercommented, Jan 3, 2018

Okay, so what we have here is an API mismatch. An HTTP response body can generally be treated as a file which…

  • Is read-only
  • Is not seekable
  • Closes itself immediately upon reading the last byte (signaled to the io module via the .closed property)

Most of these attributes are perfectly normal and expected for a file-like object. However, self-closing is not.

Behavior if read when closed EOF behavior Read on EOF behavior
File Exception Remains open Returns empty byte array
HTTP response Returns empty byte array Closes Returns empty byte array

Generally, a file is open when it can be interacted with. When it’s closed, a file pointer raises exceptions when it’s interacted with. Thus, we have a bit of a consistency problem. We can (and should) be able to interact with an HTTPResponse after its body has been completely read, but we’re currently acting as though the file is closed immediately upon consumption.

Essentially, we’re using .closed to telegraph that a file has been consumed, but the standard behavior for a (read-only, non-seekable) file which has been completely consumed (but which remains open) is to return an empty set of bytes. A file being closed has a semantically different meaning - it means that the file can no longer be interacted with in any way.

I propose that we change the behavior of .close() and .closed on HTTPResponse to more closely match the semantic meaning intended by IOBase. Calling .close() will result in a call to the underlying httplib response object’s .close() method, followed by a call to super().close(), while .closed will simply inherit from IOBase; if the response has not been specifically closed, it will appear open - which is the more conservative move, as it will never appear closed in a case where the httplib fp remains open (and therefore needs to be explicitly closed).

This table summarizes behavior before and after this change:

close() closed
Before Close underlying HTTP fp Checks that underlying HTTP fp has been closed
After Close underlying HTTP fp and set status to closed Check that status is set to closed

@nateprewitt, you’ve investigated this in the past in #977. What are your thoughts? Also CC @sigmavirus24 and @Lukasa.

2reactions
sethmlarsoncommented, Jan 3, 2018

Lots of good information and investigation here, nice work @haikuginger. You know when tables are used in a GitHub issue comment that shit’s getting serious. 😃

I’m in favor of the change as long as it goes in as a breaking change.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Wrapping urllib3.HTTPResponse in io.TextIOWrapper
There is some additional logic for closing underlying object in HTTPResponse (source) which is seemingly causing the problem. The question is: ...
Read more >
Issue 4879: Allow buffering for HTTPResponse - Python tracker
HTTPResponse creates a socket.fileobject() with zero buffering which means that the readline() operations used to read the headers become ...
Read more >
BufferedReader (Java Platform SE 8 ) - Oracle Help Center
Any operation on that stream that requires reading from the BufferedReader after it is closed, will cause an UncheckedIOException to be thrown.
Read more >
Example usage for java.io BufferedReader close - Java2s.com
Introduction. In this page you can find the example usage for java.io BufferedReader close. Prototype. public void close() throws IOException.
Read more >
Response and Decoders - urllib3 1.26.13 documentation
This class is also compatible with the Python standard library's io module ... Read and discard any remaining HTTP response data in the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found