BBC News stories no longer showing videos to download (+BBC Sports parsing failure)
See original GitHub issueChecklist
- I’m reporting a broken site support
- I’ve verified that I’m running youtube-dl version 2021.06.06
- I’ve checked that all provided URLs are alive and playable in a browser
- I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
- I’ve searched the bugtracker for similar issues including closed ones
Verbose log
[debug] System config: []
[debug] User config: ['--ffmpeg-location', 'C:\\Program Files\\ffmpeg-20200311-36aaee2-win64-shared\\bin', '-f', '137+bestaudio/298+bestaudio/136+bestaudio/135+bestaudio/134+bestaudio/DASH-VIDEO-1+bestaudio/html5-video-high+html5-audio-high/best/bestvideo+bestaudio', '--write-sub', '--convert-subs', 'srt', '--embed-subs', '--fragment-retries', 'infinite', '--retries', 'infinite']
[debug] Custom config: []
[debug] Command-line args: ['-v', 'https://www.bbc.co.uk/news/business-58423705']
[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252
[debug] youtube-dl version 2021.06.06
[debug] Python version 3.9.7 (CPython) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg git-2020-03-11-36aaee2, ffprobe git-2020-03-11-36aaee2, rtmpdump 2.4-20151223-gfa8646d-GnuTLS_3.5.12-i686-static
[debug] Proxy map: {}
[bbc] business-58423705: Downloading webpage
[download] Downloading playlist: CEO Secrets: The bra boss busting stereotypes
[bbc] playlist CEO Secrets: The bra boss busting stereotypes: Collected 0 video ids (downloading 0 of them)
[download] Finished downloading playlist: CEO Secrets: The bra boss busting stereotypes
Description
In the last couple of days, BBC News stories such as this and this are parsed as having zero videos, despite having playable videos. The issue does not appear to impact video-centric "/av/ pages such as this and this (a dedicated page for the second link above).
I use YouTube-dl for this because my netbook struggles to play these videos in the browser itself, but MPC-HC can do it.
This issue presents differently to that of BBC Sports stories with videos, which appear not to be parsed correctly at all:
[bbc] 58404777: Downloading webpage
ERROR: Unable to extract playlist data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
File "c:\program files\python39\lib\site-packages\youtube_dl\YoutubeDL.py", line 815, in wrapper
return func(self, *args, **kwargs)
File "c:\program files\python39\lib\site-packages\youtube_dl\YoutubeDL.py", line 836, in __extract_info
ie_result = ie.extract(url)
File "c:\program files\python39\lib\site-packages\youtube_dl\extractor\common.py", line 534, in extract
ie_result = self._real_extract(url)
File "c:\program files\python39\lib\site-packages\youtube_dl\extractor\bbc.py", line 1253, in _real_extract
self._search_regex(
File "c:\program files\python39\lib\site-packages\youtube_dl\extractor\common.py", line 1012, in _search_regex
raise RegexNotFoundError('Unable to extract %s' % _name)
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Your guide to the BBC Embedded Media Player - BBC News
The BBC Embedded Media Player brings you up-to-the minute reports in video and audio from BBC News.
Read more >Download the BBC Sport App - BBC News - YouTube
Download the BBC Sport app and personalise the My Sport section for the teams and sports you love.For iPhone, iPad and iTouch, download...
Read more >Have the BBC Sport and BBC News TV apps closed?
Yes. As of 16 November 2020, the BBC Sport and BBC News TV apps (UK and international) are no longer available on connected...
Read more >BBC News Online - Wikipedia
BBC News Online is closely linked to its sister department website, that of BBC Sport. Both sites follow similar layout and content options...
Read more >BBC Sport - News & Live Scores - Apps on Google Play
The official BBC Sport app offers the latest sports news, scores, live sport and highlights. It's the best way to follow all the...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
PR #30292 takes care of that:
Unmerged PRs exist for BBC but even with these, the News pages linked above are parsed as playlists with no videos.
For the business news pages, the extractor is looking at the dict-ified JSON page model
initial_data
sent as the value assigned to the JS variablewindow.__INITIAL_DATA__
. It expects to, and does, find an object x withinitial_data['x']['name'] == 'article'
. Then it expects to findx['data']['blocks']
and tries to parse the programme id and other metadata from there. In these new pages, the wanted information is inx['data']['content']['model']['blocks']
instead.The solution is to change the getter
lambda x: x['data']['blocks']
in l.1208 ofextractor/bbc.py
to this tuple:(lambda x: x['data']['blocks'], lambda x: x['data']['content']['model']['blocks'],)
The Sport example doesn’t have a video now, but with PR #28577 (unmerged) this page https://www.bbc.co.uk/sport/football/58488393 is extracted as a playlist with one video.