Youtube big playlist incomplete
See original GitHub issueChecklist
- I’m reporting a broken site support
- I’ve verified that I’m running youtube-dl version 2021.01.24.1
- I’ve checked that all provided URLs are alive and playable in a browser
- I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
- I’ve searched the bugtracker for similar issues including closed ones
Verbose log
youtube-dl.exe --add-metadata --extract-audio -f bestaudio "https://www.youtube.com/playlist?list=PLnGaZWkydyfA8yC1i0RJN0iE7idaI2dv2" -o "%(title)s.%(ext)s" -ciw
[youtube:tab] PLnGaZWkydyfA8yC1i0RJN0iE7idaI2dv2: Downloading webpage
[download] Downloading playlist: Les 2 Minutes Du Peuple - Intégrale
[youtube:tab] Downloading page 1
[youtube:tab] Downloading page 2
[youtube:tab] playlist Les 2 Minutes Du Peuple - Intégrale: Downloading 256 videos
[download] Downloading video 1 of 256
Description
EXAMPLE PLAYLIST : https://www.youtube.com/playlist?list=PLnGaZWkydyfA8yC1i0RJN0iE7idaI2dv2 EXAMPLE 2 : https://www.youtube.com/playlist?list=PLuxwzd2f67PQ2gz30YnoGNWgyr_8zuY-g
Youtube playlist webpage does not display entire playlist for large playlists. Specific parts are missing (in given examples, you can see that videos from index 101 to 160 are missing in both playlist, although the said videos exist if you open the first video and check the full playlist on the right).
On given Verbose log, you see that the download playlist has 256 videos although it should be 373.
From what I understand this is the webpage that is read by youtube-dl to fetch the playlist URL, but by doing so it doesn’t download the whole playlist. I don’t understand why youtube doesn’t display the entire list. I checked on two different devices and internet connections and the problem occurs on both, so it seems to be global. Maybe this is meant to prevent playlist auto-download ? But in this case I don’t understand why it’s only such specific indices.
Maybe youtube-dl can fetch the playlist url from another page ?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:6 (3 by maintainers)
Top GitHub Comments
Since I am able to reproduce this issue I’ve done a bit of debugging, if this helps. I found a solution that works but I don’t know the codebase well enough if it causes other issues etc. This particular one appears to be related to youtube-dl where Youtube is returning slightly different data?
I am not sure if it is the same thing causing incomplete playlists later on in the download, but from debugging it gives an impression that there are sometimes weird cases that occur which youtube-dl can’t handle.
Late night debugging ramble: Note: this is with the case where it only downloads 1 page
Looking at this part: https://github.com/ytdl-org/youtube-dl/blob/a4bdc3112bf0e925afc2e512d5f23f9097f6bc7a/youtube_dl/extractor/youtube.py#L2333-L2353 I noticed that
next_continuation
was None and that it was returning on line 2337 as it couldn’t findcontents
It seems to be that in this case there is no ‘contents’ but an ‘items’ list, which the last entry contains the continuation token.
I set that to pick up the items list:
contents = renderer.get('contents') or renderer.get('items')
and it picks up the token correctly.I’ve uploaded the contents of
renderer
as json here: https://pastebin.com/xnxLRSnXNow I get this output:
Now its showing
[youtube:tab] Downloading page 1
Now onto where it is processing the page it gets using the token: https://github.com/ytdl-org/youtube-dl/blob/a4bdc3112bf0e925afc2e512d5f23f9097f6bc7a/youtube_dl/extractor/youtube.py#L2466-L2478
I have uploaded contents of the response from line 2440 here: https://pastebin.com/JrFsAfEn It gets the
continuation_items
from the try_get on line 2466. However youtube-dl then proceeds to incorrectly extract the “renderer” out of this, as in this casecontinuation_items
is a list of "gridVideoRenderer"s.Quick fix I came up with:
Now everything seems to work as expected.
For some reason the existing code here is calling
self._playlist_entries
, but at least with the data I was getting we wantself._grid_entries
method. Has this broken into part of the normal playlist extractor (not sure if that’s a thing)?I need to get some sleep so I’ll stop there. Hopefully that helps.
I think what I am seeing is possibly different from this issue, so I’ve opened #28075.