question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BBC News stories no longer showing videos to download (+BBC Sports parsing failure)

See original GitHub issue

Checklist

  • I’m reporting a broken site support
  • I’ve verified that I’m running youtube-dl version 2021.06.06
  • I’ve checked that all provided URLs are alive and playable in a browser
  • I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
  • I’ve searched the bugtracker for similar issues including closed ones

Verbose log

[debug] System config: []
[debug] User config: ['--ffmpeg-location', 'C:\\Program Files\\ffmpeg-20200311-36aaee2-win64-shared\\bin', '-f', '137+bestaudio/298+bestaudio/136+bestaudio/135+bestaudio/134+bestaudio/DASH-VIDEO-1+bestaudio/html5-video-high+html5-audio-high/best/bestvideo+bestaudio', '--write-sub', '--convert-subs', 'srt', '--embed-subs', '--fragment-retries', 'infinite', '--retries', 'infinite']
[debug] Custom config: []
[debug] Command-line args: ['-v', 'https://www.bbc.co.uk/news/business-58423705']
[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252
[debug] youtube-dl version 2021.06.06
[debug] Python version 3.9.7 (CPython) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg git-2020-03-11-36aaee2, ffprobe git-2020-03-11-36aaee2, rtmpdump 2.4-20151223-gfa8646d-GnuTLS_3.5.12-i686-static
[debug] Proxy map: {}
[bbc] business-58423705: Downloading webpage
[download] Downloading playlist: CEO Secrets: The bra boss busting stereotypes
[bbc] playlist CEO Secrets: The bra boss busting stereotypes: Collected 0 video ids (downloading 0 of them)
[download] Finished downloading playlist: CEO Secrets: The bra boss busting stereotypes

Description

In the last couple of days, BBC News stories such as this and this are parsed as having zero videos, despite having playable videos. The issue does not appear to impact video-centric "/av/ pages such as this and this (a dedicated page for the second link above).

I use YouTube-dl for this because my netbook struggles to play these videos in the browser itself, but MPC-HC can do it.


This issue presents differently to that of BBC Sports stories with videos, which appear not to be parsed correctly at all:

[bbc] 58404777: Downloading webpage
ERROR: Unable to extract playlist data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\youtube_dl\YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "c:\program files\python39\lib\site-packages\youtube_dl\YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "c:\program files\python39\lib\site-packages\youtube_dl\extractor\common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "c:\program files\python39\lib\site-packages\youtube_dl\extractor\bbc.py", line 1253, in _real_extract
    self._search_regex(
  File "c:\program files\python39\lib\site-packages\youtube_dl\extractor\common.py", line 1012, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
Vangelis66commented, Dec 1, 2021

I used to download those BBC News Headlines every day, does not work anymore

PR #30292 takes care of that:

youtube-dl -F "https://www.bbc.com/news/av/10462520" => 

[bbc] 10462520: Downloading webpage
[bbc] p0b7mdbv: Downloading media selection JSON
[bbc] p0b7mdbv: Downloading m3u8 information
[bbc] p0b7mdbv: Downloading m3u8 information
[bbc] p0b7mdbv: Downloading m3u8 information
[bbc] p0b7mdbv: Downloading m3u8 information
[bbc] p0b7mdbv: Downloading MPD manifest
[bbc] p0b7mdbv: Downloading MPD manifest
[bbc] p0b7mdbv: Downloading MPD manifest
[bbc] p0b7mdbv: Downloading MPD manifest
[download] Downloading playlist: One-minute World News
[bbc] playlist One-minute World News: Collected 1 video ids (downloading 1 of them)
[download] Downloading video 1 of 1
[info] Available formats for p0b7mdbv:
format code                      extension  resolution note
mf_akamai-audio_eng=96000-0      m4a        audio only [en] DASH audio   96k , m4a_dash container, mp4a.40.5 (48000Hz)
mf_akamai-audio_eng=96000-1      m4a        audio only [en] DASH audio   96k , m4a_dash container, mp4a.40.5 (48000Hz)
mf_cloudfront-audio_eng=96000-0  m4a        audio only [en] DASH audio   96k , m4a_dash container, mp4a.40.5 (48000Hz)
mf_cloudfront-audio_eng=96000-1  m4a        audio only [en] DASH audio   96k , m4a_dash container, mp4a.40.5 (48000Hz)
mf_akamai-video=86000-0          mp4        192x108    DASH video   86k , mp4_dash container, avc3.42C015, 25fps, video only
mf_akamai-video=86000-1          mp4        192x108    DASH video   86k , mp4_dash container, avc3.42C015, 25fps, video only
mf_cloudfront-video=86000-0      mp4        192x108    DASH video   86k , mp4_dash container, avc3.42C015, 25fps, video only
mf_cloudfront-video=86000-1      mp4        192x108    DASH video   86k , mp4_dash container, avc3.42C015, 25fps, video only
mf_akamai-video=156000-0         mp4        256x144    DASH video  156k , mp4_dash container, avc3.42C015, 25fps, video only
mf_akamai-video=156000-1         mp4        256x144    DASH video  156k , mp4_dash container, avc3.42C015, 25fps, video only
mf_cloudfront-video=156000-0     mp4        256x144    DASH video  156k , mp4_dash container, avc3.42C015, 25fps, video only
mf_cloudfront-video=156000-1     mp4        256x144    DASH video  156k , mp4_dash container, avc3.42C015, 25fps, video only
mf_akamai-video=281000-0         mp4        384x216    DASH video  281k , mp4_dash container, avc3.42C015, 25fps, video only
mf_akamai-video=281000-1         mp4        384x216    DASH video  281k , mp4_dash container, avc3.42C015, 25fps, video only
mf_cloudfront-video=281000-0     mp4        384x216    DASH video  281k , mp4_dash container, avc3.42C015, 25fps, video only
mf_cloudfront-video=281000-1     mp4        384x216    DASH video  281k , mp4_dash container, avc3.42C015, 25fps, video only
mf_akamai-video=437000-0         mp4        512x288    DASH video  437k , mp4_dash container, avc3.4D4015, 25fps, video only
mf_akamai-video=437000-1         mp4        512x288    DASH video  437k , mp4_dash container, avc3.4D4015, 25fps, video only
mf_cloudfront-video=437000-0     mp4        512x288    DASH video  437k , mp4_dash container, avc3.4D4015, 25fps, video only
mf_cloudfront-video=437000-1     mp4        512x288    DASH video  437k , mp4_dash container, avc3.4D4015, 25fps, video only
mf_akamai-video=827000-0         mp4        704x396    DASH video  827k , mp4_dash container, avc3.4D401F, 25fps, video only
mf_akamai-video=827000-1         mp4        704x396    DASH video  827k , mp4_dash container, avc3.4D401F, 25fps, video only
mf_cloudfront-video=827000-0     mp4        704x396    DASH video  827k , mp4_dash container, avc3.4D401F, 25fps, video only
mf_cloudfront-video=827000-1     mp4        704x396    DASH video  827k , mp4_dash container, avc3.4D401F, 25fps, video only
mf_akamai-video=1604000-0        mp4        960x540    DASH video 1604k , mp4_dash container, avc3.64001F, 25fps, video only
mf_akamai-video=1604000-1        mp4        960x540    DASH video 1604k , mp4_dash container, avc3.64001F, 25fps, video only
mf_cloudfront-video=1604000-0    mp4        960x540    DASH video 1604k , mp4_dash container, avc3.64001F, 25fps, video only
mf_cloudfront-video=1604000-1    mp4        960x540    DASH video 1604k , mp4_dash container, avc3.64001F, 25fps, video only
mf_akamai-0                      mp4        256x144     224k , h264
mf_akamai-1                      mp4        256x144     224k , h264
mf_cloudfront-0                  mp4        256x144     224k , h264
mf_cloudfront-1                  mp4        256x144     224k , h264
mf_akamai-349-0                  mp4        384x216     349k , avc1.42C015@ 281k, 25.0fps, mp4a.40.5@ 48k
mf_akamai-349-1                  mp4        384x216     349k , avc1.42C015@ 281k, 25.0fps, mp4a.40.5@ 48k
mf_cloudfront-349-0              mp4        384x216     349k , avc1.42C015@ 281k, 25.0fps, mp4a.40.5@ 48k
mf_cloudfront-349-1              mp4        384x216     349k , avc1.42C015@ 281k, 25.0fps, mp4a.40.5@ 48k
mf_akamai-2                      mp4        448x252     543k , h264
mf_akamai-3                      mp4        448x252     543k , h264
mf_cloudfront-2                  mp4        448x252     543k , h264
mf_cloudfront-3                  mp4        448x252     543k , h264
mf_akamai-565-0                  mp4        512x288     565k , avc1.4D4015@ 437k, 25.0fps, mp4a.40.5@ 96k
mf_akamai-565-1                  mp4        512x288     565k , avc1.4D4015@ 437k, 25.0fps, mp4a.40.5@ 96k
mf_cloudfront-565-0              mp4        512x288     565k , avc1.4D4015@ 437k, 25.0fps, mp4a.40.5@ 96k
mf_cloudfront-565-1              mp4        512x288     565k , avc1.4D4015@ 437k, 25.0fps, mp4a.40.5@ 96k
mf_akamai-4                      mp4        640x360     800k , h264
mf_akamai-5                      mp4        640x360     800k , h264
mf_cloudfront-4                  mp4        640x360     800k , h264
mf_cloudfront-5                  mp4        640x360     800k , h264
mf_akamai-979-0                  mp4        704x396     979k , avc1.4D401F@ 827k, 25.0fps, mp4a.40.5@ 96k
mf_akamai-979-1                  mp4        704x396     979k , avc1.4D401F@ 827k, 25.0fps, mp4a.40.5@ 96k
mf_cloudfront-979-0              mp4        704x396     979k , avc1.4D401F@ 827k, 25.0fps, mp4a.40.5@ 96k
mf_cloudfront-979-1              mp4        704x396     979k , avc1.4D401F@ 827k, 25.0fps, mp4a.40.5@ 96k
mf_akamai-1802-0                 mp4        960x540    1802k , avc1.64001F@1604k, 25.0fps, mp4a.40.5@ 96k
mf_akamai-1802-1                 mp4        960x540    1802k , avc1.64001F@1604k, 25.0fps, mp4a.40.5@ 96k
mf_cloudfront-1802-0             mp4        960x540    1802k , avc1.64001F@1604k, 25.0fps, mp4a.40.5@ 96k
mf_cloudfront-1802-1             mp4        960x540    1802k , avc1.64001F@1604k, 25.0fps, mp4a.40.5@ 96k (best)
[download] Finished downloading playlist: One-minute World News
1reaction
dirkfcommented, Sep 8, 2021

Unmerged PRs exist for BBC but even with these, the News pages linked above are parsed as playlists with no videos.

For the business news pages, the extractor is looking at the dict-ified JSON page model initial_data sent as the value assigned to the JS variable window.__INITIAL_DATA__. It expects to, and does, find an object x with initial_data['x']['name'] == 'article'. Then it expects to find x['data']['blocks'] and tries to parse the programme id and other metadata from there. In these new pages, the wanted information is in x['data']['content']['model']['blocks'] instead.

The solution is to change the getter lambda x: x['data']['blocks'] in l.1208 of extractor/bbc.py to this tuple: (lambda x: x['data']['blocks'], lambda x: x['data']['content']['model']['blocks'],)

The Sport example doesn’t have a video now, but with PR #28577 (unmerged) this page https://www.bbc.co.uk/sport/football/58488393 is extracted as a playlist with one video.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Your guide to the BBC Embedded Media Player - BBC News
The BBC Embedded Media Player brings you up-to-the minute reports in video and audio from BBC News.
Read more >
Download the BBC Sport App - BBC News - YouTube
Download the BBC Sport app and personalise the My Sport section for the teams and sports you love.For iPhone, iPad and iTouch, download...
Read more >
Have the BBC Sport and BBC News TV apps closed?
Yes. As of 16 November 2020, the BBC Sport and BBC News TV apps (UK and international) are no longer available on connected...
Read more >
BBC News Online - Wikipedia
BBC News Online is closely linked to its sister department website, that of BBC Sport. Both sites follow similar layout and content options...
Read more >
BBC Sport - News & Live Scores - Apps on Google Play
The official BBC Sport app offers the latest sports news, scores, live sport and highlights. It's the best way to follow all the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found