Stream implementation is inconvenient
See original GitHub issueMany reddit bots operate as a service that needs to stream in new content in real time. Often, we want the bot to be unkillable: if it disconnects from the internet it should try to reconnect continuously; if it finds that it cannot handle a particular item, it’s probably in its best interest to just ignore it. We want the bot to stay alive as much as possible.
To facilitate streaming, PRAW provides a high level streaming function which lets a bot creator focus more on bot behaviour and less on filtering older or already seen items. Unfortunately, due to a lack of exception handling, the stream generator frequently breaks: when an exception is raised in a generator the generator breaks by issuing a StopIteration
on all further attempts to yield items from it.
In the case of stream_generator
if the stream dies it means the bot can no longer continue running its service if it doesn’t recreate the stream.
Current approaches to streaming
Ideally, a stream object should only need to be created once…
submission_stream = subreddit.stream.submissions(pause_after=None, skip_existing=True)
while True:
try:
for submission in submission_stream:
print(submission)
except (praw.exceptions.PRAWException, prawcore.exceptions.PrawcoreException) as e:
print('praw/stream related exception')
except Exception:
print('bot related exception')
However, if we go about this approach under the current stream implementation then we’d eventually find this to be an unstable setup, because an exception in the stream would bring things to a stop.
It’s currently more viable to set things up this way:
while True:
try:
for submission in subreddit.stream.submissions(pause_after=None, skip_existing=True):
print(submission)
except (praw.exceptions.PRAWException, prawcore.exceptions.PrawcoreException) as e:
print('praw/stream related exception')
except Exception:
print('bot related exception')
The bot doesn’t break so easily now because if the stream breaks it’ll just be recreated. The bot is stable and code is manageable so far.
But what if we want to do a double stream? Would a similar approach work?
while True:
try:
for submission in subreddit.stream.submissions(pause_after=-1, skip_existing=True):
if submission is None:
break
print(submission)
for comment in subreddit.stream.comments(pause_after=-1, skip_existing=True):
if comment is None:
break
print(comment)
except (praw.exceptions.PRAWException, prawcore.exceptions.PrawcoreException) as e:
print('praw/stream related exception')
except Exception:
print('bot related exception')
Turns out the same strategy won’t work here; both streams would yield None
the whole time. If we try to fix this by removing the skip_existing=True
then suddenly we’d be dealing with old and duplicate items, which is something that the stream is supposed to be handling for us. We could go back to defining the streams outside the loop, but then we’d face the same problem we had before, where an exception could easily break things.
There are two real solutions here:
- Recreate the streams when an exception happens (and have less manageable code, while being inconsistent with how a single stream is set up).
- Put each stream in its own script (and require that multiple scripts be started to achieve the full bot behaviour, rather than it being an option). This is currently the recommended approach as per the docs.
- Don’t use streams at all. Filter manually.
Clearly, these are all terrible workarounds. If we want something better, the stream’s implementation has to change.
Designing a better streaming solution
The main problem with our current stream generator is its lack of exception handling.
Since there’s no way to intercept an exception thrown in a generator in the consumer code without the generator breaking, exception handling needs to be written within the generator. At the same time we don’t want the exception handling logic to be predetermined. It’s important that we give the user a way to listen to exceptions that come from the stream. Given stream_generator
’s current implementation, this may require more than minor changes to support.
If we’re going to change streaming now, there are other inconveniences about the current streaming system that we may as well address…
Since the stream generator is intended to aid in bot making, we’d want to ensure that a new streaming object, if made, would have characteristics that maximises its usefulness in bot making. Namely,
- A streaming object shouldn’t need to be recreated.
- The stream should be as resilient as possible and should not stop yielding items even after an exception occurs.
- Errors related to the stream should preferably be dumped into a separate place from bot related errors.
- The stream should avoid yielding items older than when the stream started (
skip_existing=True
by default). - After the stream starts, it should try its hardest to yield as many items as possible, up to a provided threshold.
With all this in mind, this is how I envision an ideal streaming program to look:
# Wishful thinking: the new stream object is returned
#submission_stream = subreddit.stream.submissions()
#comment_stream = subreddit.stream.comments()
submission_stream = Stream(subreddit.new)
comment_stream = Stream(subreddit.comments)
@submission_stream.handler()
@comment_stream.handler()
def error_handler(exception):
print('praw/stream related exception')
while True:
try:
for submission in submission_stream:
if submission is None:
break
print(submission)
for comment in comment_stream:
if comment is None:
break
print(comment)
except Exception:
print('bot related exception')
If this looks any bit promising, please see and try out my Stream
class draft here.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:5
- Comments:16 (7 by maintainers)
Top GitHub Comments
Hi, I am having the same problem. The streams stop if an exception occurs and the workaround to create the stream again in the exception which is not a very good solution.
PRAW stream generator doesn’t start after an exception occurs
This issue is stale because it has been open for 20 days with no activity. Remove the Stale label or comment or this will be closed in 10 days.