question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[YouTube] Tab Extractor may not get all pages for (very) large channels

See original GitHub issue

Checklist

  • I’m reporting a broken site support issue
  • I’ve verified that I’m running youtube-dl version 2021.02.10
  • I’ve checked that all provided URLs are alive and playable in a browser
  • I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
  • I’ve searched the bugtracker for similar bug reports including closed ones
  • I’ve read bugs section in FAQ

Verbose log

Test: channel https://www.youtube.com/user/TEDxTalks/videos with ~163,192 videos (as of writing)

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', '--flat-playlist', 'https://www.youtube.com/user/TEDxTalks/videos']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.02.10
[debug] Python version 3.9.1 (CPython) - Linux-5.10.15-1-MANJARO-x86_64-with-glibc2.33
[debug] exe versions: ffmpeg 4.3.1, ffprobe 4.3.1, rtmpdump 2.4
[debug] Proxy map: {}
[youtube:tab] TEDxTalks: Downloading webpage
[download] Downloading playlist: TEDx Talks - Videos
[youtube:tab] Downloading page 1
[youtube:tab] Downloading page 2
[youtube:tab] Downloading page 3
[youtube:tab] Downloading page 4
[youtube:tab] Downloading page 5
[youtube:tab] Downloading page 6
[youtube:tab] Downloading page 7
[youtube:tab] Downloading page 8
[youtube:tab] Downloading page 9
[youtube:tab] Downloading page 10
[youtube:tab] Downloading page 11
[...]
[youtube:tab] Downloading page 1674
[youtube:tab] Downloading page 1675
[youtube:tab] Downloading page 1676
[youtube:tab] Downloading page 1677
[youtube:tab] Downloading page 1678
[youtube:tab] playlist TEDx Talks - Videos: Downloading 50339 videos
[download] Downloading video 1 of 50339
[download] Downloading video 2 of 50339
[download] Downloading video 3 of 50339
[...]
[download] Downloading video 50338 of 50339
[download] Downloading video 50339 of 50339
[download] Finished downloading playlist: TEDx Talks - Videos

In this case it only gathered 50399 videos.

Running this again for the sake of showing this isn’t a fixed limit YouTube imposes:

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', '--flat-playlist', 'https://www.youtube.com/user/TEDxTalks/videos']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.02.10
[debug] Python version 3.9.1 (CPython) - Linux-5.10.15-1-MANJARO-x86_64-with-glibc2.33
[debug] exe versions: ffmpeg 4.3.1, ffprobe 4.3.1, rtmpdump 2.4
[debug] Proxy map: {}
[youtube:tab] TEDxTalks: Downloading webpage
[download] Downloading playlist: TEDx Talks - Videos
[youtube:tab] Downloading page 1
[youtube:tab] Downloading page 2
[youtube:tab] Downloading page 3
[youtube:tab] Downloading page 4
[youtube:tab] Downloading page 5
[youtube:tab] Downloading page 6
[youtube:tab] Downloading page 7
[youtube:tab] Downloading page 8
[youtube:tab] Downloading page 9
[youtube:tab] Downloading page 10
[youtube:tab] Downloading page 11
[youtube:tab] Downloading page 12
[...]
[youtube:tab] Downloading page 2100
[youtube:tab] Downloading page 2101
[youtube:tab] Downloading page 2102
[youtube:tab] Downloading page 2103
[youtube:tab] Downloading page 2104
[youtube:tab] Downloading page 2105
[youtube:tab] Downloading page 2106
[youtube:tab] playlist TEDx Talks - Videos: Downloading 63179 videos
[download] Downloading video 1 of 63179
[download] Downloading video 2 of 63179
[download] Downloading video 3 of 63179
[download] Downloading video 4 of 63179
[download] Downloading video 5 of 63179
[download] Downloading video 6 of 63179
[download] Downloading video 7 of 63179
[download] Downloading video 8 of 63179
[...]
[download] Downloading video 63177 of 63179
[download] Downloading video 63178 of 63179
[download] Downloading video 63179 of 63179
[download] Finished downloading playlist: TEDx Talks - Videos

This time it gathered 63179 videos.

Description

I’ve done some investigating into what I think causing this:

When downloading tab pages, the next page downloaded using the continuation token found in the previous may not contain any continuation items/contents (i.e. videos). This appears to be a server side issue with YouTube (whether that be a form of rate-limiting). The HTTP status for these pages is 200.

From my findings, simply retrying the page download with the same continuation token (sometimes more than once) will eventually(?) return a page with the continuation items.

I have found this mostly happens when you try to download channels with tens of thousands of videos.

This is an issue as when there is no continuation items, youtube-dl breaks out of the page extraction loop. In the case of this issue, this causes youtube-dl to not get all the videos on the channel/provided by YouTube, and incorrectly treats it like it has extracted all (false success).

Part of the extractor I’m referring to for reference: https://github.com/ytdl-org/youtube-dl/blob/9fc5eafb8e384453a49f7cfe73147be491f0b19d/youtube_dl/extractor/youtube.py#L2483-L2553

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:3
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
pukkandancommented, Feb 18, 2021

ok. I misunderstood the original problem and thought that only continuation token is missing. When I tried to test the issue, I got 429’d 😢

0reactions
coletdjnzcommented, Mar 2, 2021

This doesn’t seem limited to large playlists.

This channel, as of writing, is currently broken when downloaded using oldest first sorting. In this particular case retrying doesn’t help (tested) however it shows the false success issue after it gets incomplete data for the first continuation page, in which I’d expect an exception to be raised.

youtube-dl --verbose --flat-playlist "https://www.youtube.com/SBTjornalismo/videos?sort=da" 
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', '--flat-playlist', 'https://www.youtube.com/SBTjornalismo/videos?sort=da']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.03.02
[debug] Python version 3.9.1 (CPython) - Linux-5.10.15-1-MANJARO-x86_64-with-glibc2.33
[debug] exe versions: ffmpeg 4.3.1, ffprobe 4.3.1, rtmpdump 2.4
[debug] Proxy map: {}
[youtube:tab] SBTjornalismo: Downloading webpage
[download] Downloading playlist: SBT Jornalismo - Videos
[youtube:tab] Downloading page 1
[youtube:tab] playlist SBT Jornalismo - Videos: Downloading 10 videos
[download] Downloading video 1 of 10
[download] Downloading video 2 of 10
[download] Downloading video 3 of 10
[download] Downloading video 4 of 10
[download] Downloading video 5 of 10
[download] Downloading video 6 of 10
[download] Downloading video 7 of 10
[download] Downloading video 8 of 10
[download] Downloading video 9 of 10
[download] Downloading video 10 of 10
[download] Finished downloading playlist: SBT Jornalismo - Videos
Read more comments on GitHub >

github_iconTop Results From Across the Web

[YouTube] Tab Extractor only extracts first page (30 videos ...
On this particular large channel sometimes it will only extract one page: ... [YouTube] tab extractor doesn't extract all pages sometimes ...
Read more >
YouTube Tag Extractor Tool
The YouTube Tag Extractor tool enables you to see and extract YouTube Tags for any video. This is great for improving your own...
Read more >
How to Extract Data from YouTube using R ... - Yuichi Otsuka
In this guide, we will learn how to extract data from YouTube in four steps. Using R and the YouTube Data API, we...
Read more >
How to extract tags from youtube videos
Super easy way to extract tags from videos online.tag extractor tool:https://online-free-tools.com/en/youtube_video_tags_extract_urlTag ...
Read more >
How to See What Tags a YouTube Video Is Using (2023)
Go to the video page using Chrome or Firefox. ... The YouTube Creators channel says that tags are “not very important” when it...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found