plugins.twitch: purple screen doesn't get filtered out correctly (embedded ads)
See original GitHub issueChecklist
- This is a plugin issue and not a different kind of issue
- I have read the contribution guidelines
- I have checked the list of open and recently closed plugin issues
- I have checked the commit log of the master branch
Streamlink version
Latest build from the master branch
Description
Embedded ads meta-thread here: #3210
Twitch has, as expected, made new changes to their embedded ads system after their source code has been leaked a few weeks ago.
New access token request headers were added in #4086, but this, as expected as well, stopped working too, at least for non-preroll ads as far as I can tell.
The --twitch-disable-ads
parameter still seems to be able to filter out ads, but there’s one HLS segment with the purple screen which doesn’t get caught by it, so the purple screen appears just before the stream output stops for filtering out the ads. It’s possible that some timestamps are set differently now or that they are using different values in the metadata for annotating the ad segments.
To be able to fix this, we need to know the actual HLS playlist contents when the embedded ads start.
Debug log
-
Issue Analytics
- State:
- Created 2 years ago
- Reactions:8
- Comments:7 (5 by maintainers)
This only affects the access token, which determines whether you’re seeing ads or not. Both access token request parameters cause ads, but I feel like the switch from
embed
tosite
in #4156 is causing more ads now, so I think we should revert that. It also seems likeembed
doesn’t cause any midroll ads.However, this thread is about ads filtering, and the current access token request parameter value shows that this isn’t working 100% reliably anymore, so a fix is needed here as well. I’ve been observing HLS streams on Twitch for a bit but haven’t been able to find anything yet. My gut feeling says that the issue with the stuck filtering is caused by an
END-ON-NEXT
daterange tag, which is currently unsupported. It might also be due to an invalid playlist reload time calculation.I’ve had another look at the issue and it’s now clear to me why Twitch’s midroll ads (if they occur - I haven’t seen one in a year or so) don’t get filtered out properly. The issue is caused by the low-latency and segment prefetch implementation, which isn’t solved in an ideal way.
The prefetching works like this: there are regular HLS segments with time and duration metadata, and there are two prefetch URLs added to the playlist if the stream is a low latency stream.
Streamlink currently clones the last regular segment and replaces its URL for each available prefetch URL and then extends the playlist by appending those prefetch segments. Since prefetch items don’t include any metadata because it is unknown ahead of time, the metadata of the last regular segment gets re-used when appending prefetch segments.
Streamlink already guesses the duration of the appended prefetch segments so that the playlist refresh times can be as low as possible, as the refresh time is determined by the duration of the last segment. However, it is currently not re-calculating the time of the prefetch segments, which means that the ad-calculation is wrong too, so those prefetch segments will never get filtered out unless the last regular segment was detected as an ad. But since Streamlink is extending the playlist with prefetch segments, the last regular segment is always a couple of seconds behind. Whether Twitch includes ads in the prefetch data is the big question here, and if they do, then this is the cause of this issue.
A proper prefetch implementation would be not cloning the last regular segment and instead only downloading and caching the prefetch data, and using the cached data on the next playlist refresh for the regular segment that matches the sequence number and/or URL of the cached data. I have already successfully rewritten this locally and experimented with it a bit, but unfortunately it looks like it’s adding a delay to the stream output, because it has to wait for the next playlist refresh first in order to use the cached prefetch data. This means that by using this prefetch implementation, there will always be a delay of the duration of the last regular segment, which is about two seconds.
I will take a look at re-calculating the time and ad status of prefetch segments with the current implementation. This won’t change the low latency stuff and will hopefully fix this issue.