Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BBC News stories no longer showing videos to download (+BBC Sports parsing failure)

See original GitHub issue

Checklist

I’m reporting a broken site support
I’ve verified that I’m running youtube-dl version 2021.06.06
I’ve checked that all provided URLs are alive and playable in a browser
I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
I’ve searched the bugtracker for similar issues including closed ones

Verbose log

[debug] System config: []
[debug] User config: ['--ffmpeg-location', 'C:\\Program Files\\ffmpeg-20200311-36aaee2-win64-shared\\bin', '-f', '137+bestaudio/298+bestaudio/136+bestaudio/135+bestaudio/134+bestaudio/DASH-VIDEO-1+bestaudio/html5-video-high+html5-audio-high/best/bestvideo+bestaudio', '--write-sub', '--convert-subs', 'srt', '--embed-subs', '--fragment-retries', 'infinite', '--retries', 'infinite']
[debug] Custom config: []
[debug] Command-line args: ['-v', 'https://www.bbc.co.uk/news/business-58423705']
[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252
[debug] youtube-dl version 2021.06.06
[debug] Python version 3.9.7 (CPython) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg git-2020-03-11-36aaee2, ffprobe git-2020-03-11-36aaee2, rtmpdump 2.4-20151223-gfa8646d-GnuTLS_3.5.12-i686-static
[debug] Proxy map: {}
[bbc] business-58423705: Downloading webpage
[download] Downloading playlist: CEO Secrets: The bra boss busting stereotypes
[bbc] playlist CEO Secrets: The bra boss busting stereotypes: Collected 0 video ids (downloading 0 of them)
[download] Finished downloading playlist: CEO Secrets: The bra boss busting stereotypes

Description

In the last couple of days, BBC News stories such as this and this are parsed as having zero videos, despite having playable videos. The issue does not appear to impact video-centric "/av/ pages such as this and this (a dedicated page for the second link above).

I use YouTube-dl for this because my netbook struggles to play these videos in the browser itself, but MPC-HC can do it.

This issue presents differently to that of BBC Sports stories with videos, which appear not to be parsed correctly at all:

[bbc] 58404777: Downloading webpage
ERROR: Unable to extract playlist data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "c:\program files\python39\lib\site-packages\youtube_dl\YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "c:\program files\python39\lib\site-packages\youtube_dl\YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "c:\program files\python39\lib\site-packages\youtube_dl\extractor\common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "c:\program files\python39\lib\site-packages\youtube_dl\extractor\bbc.py", line 1253, in _real_extract
    self._search_regex(
  File "c:\program files\python39\lib\site-packages\youtube_dl\extractor\common.py", line 1012, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)

Issue Analytics

State:
Created 2 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

Vangelis66commented, Dec 1, 2021

I used to download those BBC News Headlines every day, does not work anymore

PR #30292 takes care of that:

youtube-dl -F "https://www.bbc.com/news/av/10462520" => 

[bbc] 10462520: Downloading webpage
[bbc] p0b7mdbv: Downloading media selection JSON
[bbc] p0b7mdbv: Downloading m3u8 information
[bbc] p0b7mdbv: Downloading m3u8 information
[bbc] p0b7mdbv: Downloading m3u8 information
[bbc] p0b7mdbv: Downloading m3u8 information
[bbc] p0b7mdbv: Downloading MPD manifest
[bbc] p0b7mdbv: Downloading MPD manifest
[bbc] p0b7mdbv: Downloading MPD manifest
[bbc] p0b7mdbv: Downloading MPD manifest
[download] Downloading playlist: One-minute World News
[bbc] playlist One-minute World News: Collected 1 video ids (downloading 1 of them)
[download] Downloading video 1 of 1
[info] Available formats for p0b7mdbv:
format code                      extension  resolution note
mf_akamai-audio_eng=96000-0      m4a        audio only [en] DASH audio   96k , m4a_dash container, mp4a.40.5 (48000Hz)
mf_akamai-audio_eng=96000-1      m4a        audio only [en] DASH audio   96k , m4a_dash container, mp4a.40.5 (48000Hz)
mf_cloudfront-audio_eng=96000-0  m4a        audio only [en] DASH audio   96k , m4a_dash container, mp4a.40.5 (48000Hz)
mf_cloudfront-audio_eng=96000-1  m4a        audio only [en] DASH audio   96k , m4a_dash container, mp4a.40.5 (48000Hz)
mf_akamai-video=86000-0          mp4        192x108    DASH video   86k , mp4_dash container, avc3.42C015, 25fps, video only
mf_akamai-video=86000-1          mp4        192x108    DASH video   86k , mp4_dash container, avc3.42C015, 25fps, video only
mf_cloudfront-video=86000-0      mp4        192x108    DASH video   86k , mp4_dash container, avc3.42C015, 25fps, video only
mf_cloudfront-video=86000-1      mp4        192x108    DASH video   86k , mp4_dash container, avc3.42C015, 25fps, video only
mf_akamai-video=156000-0         mp4        256x144    DASH video  156k , mp4_dash container, avc3.42C015, 25fps, video only
mf_akamai-video=156000-1         mp4        256x144    DASH video  156k , mp4_dash container, avc3.42C015, 25fps, video only
mf_cloudfront-video=156000-0     mp4        256x144    DASH video  156k , mp4_dash container, avc3.42C015, 25fps, video only
mf_cloudfront-video=156000-1     mp4        256x144    DASH video  156k , mp4_dash container, avc3.42C015, 25fps, video only
mf_akamai-video=281000-0         mp4        384x216    DASH video  281k , mp4_dash container, avc3.42C015, 25fps, video only
mf_akamai-video=281000-1         mp4        384x216    DASH video  281k , mp4_dash container, avc3.42C015, 25fps, video only
mf_cloudfront-video=281000-0     mp4        384x216    DASH video  281k , mp4_dash container, avc3.42C015, 25fps, video only
mf_cloudfront-video=281000-1     mp4        384x216    DASH video  281k , mp4_dash container, avc3.42C015, 25fps, video only
mf_akamai-video=437000-0         mp4        512x288    DASH video  437k , mp4_dash container, avc3.4D4015, 25fps, video only
mf_akamai-video=437000-1         mp4        512x288    DASH video  437k , mp4_dash container, avc3.4D4015, 25fps, video only
mf_cloudfront-video=437000-0     mp4        512x288    DASH video  437k , mp4_dash container, avc3.4D4015, 25fps, video only
mf_cloudfront-video=437000-1     mp4        512x288    DASH video  437k , mp4_dash container, avc3.4D4015, 25fps, video only
mf_akamai-video=827000-0         mp4        704x396    DASH video  827k , mp4_dash container, avc3.4D401F, 25fps, video only
mf_akamai-video=827000-1         mp4        704x396    DASH video  827k , mp4_dash container, avc3.4D401F, 25fps, video only
mf_cloudfront-video=827000-0     mp4        704x396    DASH video  827k , mp4_dash container, avc3.4D401F, 25fps, video only
mf_cloudfront-video=827000-1     mp4        704x396    DASH video  827k , mp4_dash container, avc3.4D401F, 25fps, video only
mf_akamai-video=1604000-0        mp4        960x540    DASH video 1604k , mp4_dash container, avc3.64001F, 25fps, video only
mf_akamai-video=1604000-1        mp4        960x540    DASH video 1604k , mp4_dash container, avc3.64001F, 25fps, video only
mf_cloudfront-video=1604000-0    mp4        960x540    DASH video 1604k , mp4_dash container, avc3.64001F, 25fps, video only
mf_cloudfront-video=1604000-1    mp4        960x540    DASH video 1604k , mp4_dash container, avc3.64001F, 25fps, video only
mf_akamai-0                      mp4        256x144     224k , h264
mf_akamai-1                      mp4        256x144     224k , h264
mf_cloudfront-0                  mp4        256x144     224k , h264
mf_cloudfront-1                  mp4        256x144     224k , h264
mf_akamai-349-0                  mp4        384x216     349k , avc1.42C015@ 281k, 25.0fps, mp4a.40.5@ 48k
mf_akamai-349-1                  mp4        384x216     349k , avc1.42C015@ 281k, 25.0fps, mp4a.40.5@ 48k
mf_cloudfront-349-0              mp4        384x216     349k , avc1.42C015@ 281k, 25.0fps, mp4a.40.5@ 48k
mf_cloudfront-349-1              mp4        384x216     349k , avc1.42C015@ 281k, 25.0fps, mp4a.40.5@ 48k
mf_akamai-2                      mp4        448x252     543k , h264
mf_akamai-3                      mp4        448x252     543k , h264
mf_cloudfront-2                  mp4        448x252     543k , h264
mf_cloudfront-3                  mp4        448x252     543k , h264
mf_akamai-565-0                  mp4        512x288     565k , avc1.4D4015@ 437k, 25.0fps, mp4a.40.5@ 96k
mf_akamai-565-1                  mp4        512x288     565k , avc1.4D4015@ 437k, 25.0fps, mp4a.40.5@ 96k
mf_cloudfront-565-0              mp4        512x288     565k , avc1.4D4015@ 437k, 25.0fps, mp4a.40.5@ 96k
mf_cloudfront-565-1              mp4        512x288     565k , avc1.4D4015@ 437k, 25.0fps, mp4a.40.5@ 96k
mf_akamai-4                      mp4        640x360     800k , h264
mf_akamai-5                      mp4        640x360     800k , h264
mf_cloudfront-4                  mp4        640x360     800k , h264
mf_cloudfront-5                  mp4        640x360     800k , h264
mf_akamai-979-0                  mp4        704x396     979k , avc1.4D401F@ 827k, 25.0fps, mp4a.40.5@ 96k
mf_akamai-979-1                  mp4        704x396     979k , avc1.4D401F@ 827k, 25.0fps, mp4a.40.5@ 96k
mf_cloudfront-979-0              mp4        704x396     979k , avc1.4D401F@ 827k, 25.0fps, mp4a.40.5@ 96k
mf_cloudfront-979-1              mp4        704x396     979k , avc1.4D401F@ 827k, 25.0fps, mp4a.40.5@ 96k
mf_akamai-1802-0                 mp4        960x540    1802k , avc1.64001F@1604k, 25.0fps, mp4a.40.5@ 96k
mf_akamai-1802-1                 mp4        960x540    1802k , avc1.64001F@1604k, 25.0fps, mp4a.40.5@ 96k
mf_cloudfront-1802-0             mp4        960x540    1802k , avc1.64001F@1604k, 25.0fps, mp4a.40.5@ 96k
mf_cloudfront-1802-1             mp4        960x540    1802k , avc1.64001F@1604k, 25.0fps, mp4a.40.5@ 96k (best)
[download] Finished downloading playlist: One-minute World News

1reaction

dirkfcommented, Sep 8, 2021

Unmerged PRs exist for BBC but even with these, the News pages linked above are parsed as playlists with no videos.

For the business news pages, the extractor is looking at the dict-ified JSON page model initial_data sent as the value assigned to the JS variable window.__INITIAL_DATA__. It expects to, and does, find an object x with initial_data['x']['name'] == 'article'. Then it expects to find x['data']['blocks'] and tries to parse the programme id and other metadata from there. In these new pages, the wanted information is in x['data']['content']['model']['blocks'] instead.

The solution is to change the getter lambda x: x['data']['blocks'] in l.1208 of extractor/bbc.py to this tuple: (lambda x: x['data']['blocks'], lambda x: x['data']['content']['model']['blocks'],)

The Sport example doesn’t have a video now, but with PR #28577 (unmerged) this page https://www.bbc.co.uk/sport/football/58488393 is extracted as a playlist with one video.