Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ChunkedEncodingError

See original GitHub issue

Just spotted the following in the logs for a pair of my streamers:

Traceback (most recent call last):
  File "/home/keyz/tweets/tweetstream.py", line 20, in <module>
    stream.statuses.filter(locations=location)
  File "/usr/local/lib/python2.7/dist-packages/twython/streaming/types.py", line 65, in filter
    self.streamer._request(url, 'POST', params=params)
  File "/usr/local/lib/python2.7/dist-packages/twython/streaming/api.py", line 134, in _request
    for line in response.iter_lines(self.chunk_size):
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 602, in iter_lines
    decode_unicode=decode_unicode):
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 575, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: IncompleteRead(0 bytes read)

The line in question is

for line in response.iter_lines(self.chunk_size)

in https://github.com/ryanmcgrath/twython/blob/master/twython/streaming/api.py

Should there be some catch for this to pass it to on_error, rather than throwing an uncaught exception?

Looks like the exception is a fairly new one, see https://github.com/kennethreitz/requests/pull/1498.

I’d submit a patch, but I’m not sure of the best way to catch this. Putting a try block around the whole loop seems messy.

Issue Analytics

State:
Created 10 years ago
Comments:54 (10 by maintainers)

Top GitHub Comments

11reactions

colditzjbcommented, Dec 13, 2014

I’ve been able to process slow-flowing streams (i.e., searching for rare keywords) for weeks at a time, collecting tens-of-thousands of tweets with with no problems. I first encountered this particular error when I was doing a test run with very broad/popular search terms - streaming a lot of tweets very fast. I’ve been able to replicate the error pretty consistently by increasing the rate of data streaming (especially on computers with slower processors). From what I can tell, this error is directly related to Twitter API disconnecting due to queue overload:

A client reads data too slowly. Every streaming connection is backed by a queue of messages to be sent to the client. If this queue grows too large over time, the connection will be closed.

Here’s an example where I recorded stream latency for a particularly fast stream on a particularly slow computer. You’ll notice that stream latency grows to a peak (red points = data collected) and then drops off, resulting in many seconds of lost data:

2014-12-12_latency

Note: x-axis: Data Collection Time is parsed from the tweet’s timestamp y-axis: Latency = actual clock time - tweet’s timestamp

Each of those peaks directly corresponds with a Chunked Encoding Error. When this happens, Twitter’s streaming queue dumps and you start over in real time (if you immediately restart the streamer)… but you lose as many seconds of data as you had latency.

If you want to avoid this issue, your best bet is to eliminate extra processes that slow down your ability to retrieve streaming data. Stream the JSON data directly to storage, then use a secondary process to parse it as needed. If you can narrow the filter terms, that would also help to slow the stream of data. Alternately, you could get a dedicated server with more processing power.

If you’re not too worried about data loss, this is a crude solution that got me back up and running quickly. I altered my own code instead of updating the underlying Twython code. A similar solution was mentioned early-on in the thread, but I didn’t see any example code for it. Enclose the stream.statuses.filter() call in a while loop with an exception handler, like this:

(Example works in Python version 2.7)

#import sys #Do this if you want to log error output

while True:  #Endless loop: personalize to suit your own purposes
    try: 
        stream.statuses.filter(track='foo bar,foobar,more search strings here')
    except:
        #e = sys.exc_info()[0]  #Get exception info (optional)
        #print 'ERROR:',e  #Print exception info (optional)
        continue