question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Stream implementation is inconvenient

See original GitHub issue

Many reddit bots operate as a service that needs to stream in new content in real time. Often, we want the bot to be unkillable: if it disconnects from the internet it should try to reconnect continuously; if it finds that it cannot handle a particular item, it’s probably in its best interest to just ignore it. We want the bot to stay alive as much as possible.

To facilitate streaming, PRAW provides a high level streaming function which lets a bot creator focus more on bot behaviour and less on filtering older or already seen items. Unfortunately, due to a lack of exception handling, the stream generator frequently breaks: when an exception is raised in a generator the generator breaks by issuing a StopIteration on all further attempts to yield items from it.

In the case of stream_generator if the stream dies it means the bot can no longer continue running its service if it doesn’t recreate the stream.

Current approaches to streaming

Ideally, a stream object should only need to be created once…

submission_stream = subreddit.stream.submissions(pause_after=None, skip_existing=True)

while True:
    try:
        for submission in submission_stream:
            print(submission)

    except (praw.exceptions.PRAWException, prawcore.exceptions.PrawcoreException) as e:
        print('praw/stream related exception')

    except Exception:
        print('bot related exception')

However, if we go about this approach under the current stream implementation then we’d eventually find this to be an unstable setup, because an exception in the stream would bring things to a stop.

It’s currently more viable to set things up this way:

while True:
    try:
        for submission in subreddit.stream.submissions(pause_after=None, skip_existing=True):
            print(submission)

    except (praw.exceptions.PRAWException, prawcore.exceptions.PrawcoreException) as e:
        print('praw/stream related exception')

    except Exception:
        print('bot related exception')

The bot doesn’t break so easily now because if the stream breaks it’ll just be recreated. The bot is stable and code is manageable so far.

But what if we want to do a double stream? Would a similar approach work?

while True:
    try:
        for submission in subreddit.stream.submissions(pause_after=-1, skip_existing=True):
            if submission is None:
                break
            print(submission)

        for comment in subreddit.stream.comments(pause_after=-1, skip_existing=True):
            if comment is None:
                break
            print(comment)

    except (praw.exceptions.PRAWException, prawcore.exceptions.PrawcoreException) as e:
        print('praw/stream related exception')

    except Exception:
        print('bot related exception')

Turns out the same strategy won’t work here; both streams would yield None the whole time. If we try to fix this by removing the skip_existing=True then suddenly we’d be dealing with old and duplicate items, which is something that the stream is supposed to be handling for us. We could go back to defining the streams outside the loop, but then we’d face the same problem we had before, where an exception could easily break things.

There are two real solutions here:

  1. Recreate the streams when an exception happens (and have less manageable code, while being inconsistent with how a single stream is set up).
  2. Put each stream in its own script (and require that multiple scripts be started to achieve the full bot behaviour, rather than it being an option). This is currently the recommended approach as per the docs.
  3. Don’t use streams at all. Filter manually.

Clearly, these are all terrible workarounds. If we want something better, the stream’s implementation has to change.

Designing a better streaming solution

The main problem with our current stream generator is its lack of exception handling.

Since there’s no way to intercept an exception thrown in a generator in the consumer code without the generator breaking, exception handling needs to be written within the generator. At the same time we don’t want the exception handling logic to be predetermined. It’s important that we give the user a way to listen to exceptions that come from the stream. Given stream_generator’s current implementation, this may require more than minor changes to support.

If we’re going to change streaming now, there are other inconveniences about the current streaming system that we may as well address…

Since the stream generator is intended to aid in bot making, we’d want to ensure that a new streaming object, if made, would have characteristics that maximises its usefulness in bot making. Namely,

  1. A streaming object shouldn’t need to be recreated.
  2. The stream should be as resilient as possible and should not stop yielding items even after an exception occurs.
  3. Errors related to the stream should preferably be dumped into a separate place from bot related errors.
  4. The stream should avoid yielding items older than when the stream started (skip_existing=True by default).
  5. After the stream starts, it should try its hardest to yield as many items as possible, up to a provided threshold.

With all this in mind, this is how I envision an ideal streaming program to look:

# Wishful thinking: the new stream object is returned
#submission_stream = subreddit.stream.submissions()
#comment_stream = subreddit.stream.comments()
submission_stream = Stream(subreddit.new)
comment_stream = Stream(subreddit.comments)

@submission_stream.handler()
@comment_stream.handler()
def error_handler(exception):
    print('praw/stream related exception')

while True:
    try:
        for submission in submission_stream:
            if submission is None:
                break
            print(submission)

        for comment in comment_stream:
            if comment is None:
                break
            print(comment)

    except Exception:
        print('bot related exception')

If this looks any bit promising, please see and try out my Stream class draft here.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:5
  • Comments:16 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
isFakeAccountcommented, Oct 12, 2020

Hi, I am having the same problem. The streams stop if an exception occurs and the workaround to create the stream again in the exception which is not a very good solution.

PRAW stream generator doesn’t start after an exception occurs

0reactions
github-actions[bot]commented, Feb 3, 2022

This issue is stale because it has been open for 20 days with no activity. Remove the Stale label or comment or this will be closed in 10 days.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Java 8 Stream : Need with examples and Limitations
We can use Java Stream API to implement internal iteration, that is better because java framework is in control of the iteration.
Read more >
Use the Stream API More Simply or Don't Use it at All : r/java
I appreciate the reasons why streams were implemented like this in Java, but they really need a more compact API for everyday use...
Read more >
java - Why does Stream<T> not implement Iterable<T>?
To pass a Stream to a method that expects Iterable , ... There is one major disadvantage: ... Stream does not implement Iterable...
Read more >
The 6 biggest problems of Java 8 - devmio
1. Parallel Streams can actually slow you down ... Java 8 brings the promise of parallelism as one of the most anticipated new...
Read more >
3 Reasons why You Shouldn't Replace Your for-loops by ...
3 Reasons why You Shouldn't Replace Your for-loops by Stream.forEach() ... Awesome! We're migrating our code base to Java 8. We'll replace ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found