[rts.ch] Unable to extract internal video id
See original GitHub issueChecklist
- I’m reporting a broken site
- I’ve verified that I’m running yt-dlp version 2022.02.04. (update instructions)
- I’ve checked that all provided URLs are alive and playable in a browser
- I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
- I’ve searched the bugtracker for similar issues including closed ones. DO NOT post duplicates
- I’ve read the guidelines for opening an issue
- I’ve read about sharing account credentials and I’m willing to share it if required
Region
Anywhere
Description
Looks like some regex issue. Test link: https://www.rts.ch/info/regions/valais/12865814-un-bouquetin-emporte-par-un-aigle-royal-sur-les-hauts-de-fully-vs.html
Verbose log
yt-dlp -v -F https://www.rts.ch/info/regions/valais/12865814-un-bouquetin-emporte-par-un-aigle-royal-sur-les-hauts-de-fully-vs.html
[debug] Command-line config: ['-v', '-F', 'https://www.rts.ch/info/regions/valais/12865814-un-bouquetin-emporte-par-un-aigle-royal-sur-les-hauts-de-fully-vs.html']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, err utf-8, pref UTF-8
[debug] yt-dlp version 2022.02.03 [28469edd7] (zip)
[debug] Plugins: ['SamplePluginIE', 'SamplePluginPP']
[debug] Python version 3.9.10 (CPython 64bit) - macOS-11.6.3-arm64-arm-64bit
[debug] exe versions: none
[debug] Optional libraries: sqlite
[debug] Proxy map: {}
[debug] [RTS] Extracting URL: https://www.rts.ch/info/regions/valais/12865814-un-bouquetin-emporte-par-un-aigle-royal-sur-les-hauts-de-fully-vs.html
[RTS] un-bouquetin-emporte-par-un-aigle-royal-sur-les-hauts-de-fully-vs: Downloading JSON metadata
[RTS] un-bouquetin-emporte-par-un-aigle-royal-sur-les-hauts-de-fully-vs: Downloading webpage
ERROR: [RTS] 12865814: Unable to extract internal video id; please report this issue on https://github.com/yt-dlp/yt-dlp , filling out the "Broken site" issue template properly. Confirm you are on the latest version using -U; please report this issue on https://github.com/yt-dlp/yt-dlp , filling out the "Broken site" issue template properly. Confirm you are on the latest version using -U
File "/Users/zig/Downloads/DrB/yt-dlp/./yt-dlp/yt_dlp/extractor/common.py", line 615, in extract
ie_result = self._real_extract(url)
File "/Users/zig/Downloads/DrB/yt-dlp/./yt-dlp/yt_dlp/extractor/rts.py", line 159, in _real_extract
internal_id = self._html_search_regex(
File "/Users/zig/Downloads/DrB/yt-dlp/./yt-dlp/yt_dlp/extractor/common.py", line 1198, in _html_search_regex
res = self._search_regex(pattern, string, name, default, fatal, flags, group)
File "/Users/zig/Downloads/DrB/yt-dlp/./yt-dlp/yt_dlp/extractor/common.py", line 1189, in _search_regex
raise RegexNotFoundError('Unable to extract %s' % _name)
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (7 by maintainers)
Top Results From Across the Web
ESP32-CAM Troubleshooting Guide: Most Common Problems ...
This guide is a compilation with the most common errors when using the ESP32-CAM and how to fix them. The ESP32-CAM can be...
Read more >Support - Dealertrack
Forgot your password or need help with your login ID? Watch a video on how to reset your password or click below to...
Read more >[Notebook/AIO] Troubleshooting | Official Support | ASUS USA
Your browser can't play this video. ... After the extraction is completed, please copy the entire folder (RST_V19.1.0.1001_PV) to a USB ...
Read more >Known Issues with Oracle Database ... - Oracle Help Center
DCS-10001:Internal error encountered: Failed to get the LVM free space. ... on communication channel Process ID: 0 Session ID: 0 Serial number: 0...
Read more >Remote Management Controller User's Guide - Fujitsu
This chapter explains an overview of the Remote Management Controller. ... Video Redirection and Remote Storage Connection →"4.11 Console Redirection" ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
The issue with the test framework should ideally be tracked in a separate issue. For #5275, it’s sufficient to just add
skip_download
In addition to this, many of the extractor tests fail.
The regex error is just the start. The pattern has to be adjusted to find things like
data-media-urn="urn:rts:video:nnnnnnnn"
to get the numeric id nnnnnnnn.The extractor fetches three sources of metadata as well as the page itself:
'http://www.rts.ch/a/%s.html?f=json/article' % item_id
whereitem_id
is the ID extracted from the URL (in this case 12865814)internal_id
extracted from the webpage asitem_id
(in this case it should be 12861415, once the pattern is modified)_get_media_data()
method of the parentSRGSSRGIE
extractor, this'https://il.srgssr.ch/integrationlayer/2.0/%s/mediaComposition/%s/%s.json' % ('rts', 'video', item_id)
, whose result is discarded.JSON # 1 is valid but contains no media links.
JSON # 2 is valid and contains media links at
.video.JSONinfo.streams
as a dict of format_id:url_path, with a base url at.video.JSONinfo.download
. But the URLs constructed by joining each url_path to the base fail because the domain of the base URL isn’t valid.The discarded JSON # 3 has a media link for the entire show (not just the target clip) at
.chapterList[0].resourceList
, a list of dicts with keyurl
. It could be possible to construct the clip URL by finding the segment by clip ID in.chapterList[0].segmentList
and adding themarkIn
andmarkOut
query parameters to the URL, but the segment list isn’t returned with theonlyChapters=true
query parameter that_get_media_data()
uses for videos.A browser session doesn’t use either of the first two API calls. It uses a different endpoint of the third,
'https://il.srgssr.ch/integrationlayer/2.0/mediaComposition/byUrn/urn:%s:%s:%s.json' % ('rts', 'video', item_id)
. The JSON resulting from this has a media link at.chapterList[i].resourceList[0]
, wherei
is such that.chapterList[i].id
is the clip ID, a dict with keyurl
. This link can be passed to the existing code and finds the target clip.So this patch gets the clip, but it would need more testing by users in CH, as the base extractor seems to handle a lot of sources that aren’t covered by tests: