problem with double-dot segments (`/../`) after the hostname
See original GitHub issueChecklist
- I’m reporting a bug unrelated to a specific site
- I’ve verified that I’m running yt-dlp version 2022.04.08 (update instructions) or later (specify commit)
- I’ve checked that all provided URLs are alive and playable in a browser
- I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
- I’ve searched the bugtracker for similar issues including closed ones. DO NOT post duplicates
- I’ve read the guidelines for opening an issue
Description
Some URLs have a double-dot section after the hostname, which causes problems in yt-dlp.
Example: https://streamwo.com/v/gp445h2f if we resolve this URL we get this:
$ yt-dlp --get-url https://streamwo.com/v/gp445h2f
https://reoa92d.com/../uploaded/1649416469.mp4#t=0.1
Which has a ../
segment right after the hostname.
Opening this result in browsers, or downloading it using curl is no problem:
$ curl -O https://reoa92d.com/../uploaded/1649416469.mp4
...
Succeeds
But yt-dlp fails:
$ yt-dlp https://streamwo.com/v/gp445h2f
[generic] gp445h2f: Requesting header
WARNING: [generic] Falling back on generic information extractor.
[generic] gp445h2f: Downloading webpage
[generic] gp445h2f: Extracting information
[download] Downloading playlist: Streamwo
[generic] playlist Streamwo: Collected 1 videos; downloading 1 of them
[download] Downloading video 1 of 1
[info] gp445h2f: Downloading 1 format(s): 0
ERROR: unable to download video data: HTTP Error 400: Bad Request
[download] Finished downloading playlist: Streamwo
mpv (which uses yt-dlp in it’s ytdl_hook) fails as well:
$ mpv https://streamwo.com/v/gp445h2f
[ffmpeg] https: HTTP error 400 Bad Request
Failed to open https://reoa92d.com/../uploaded/1649416469.mp4#t=0.1.
Exiting... (Errors when loading file)
Verbose log
$ yt-dlp -vU https://streamwo.com/v/gp445h2f
[debug] Command-line config: ['-vU', 'https://streamwo.com/v/gp445h2f']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, err utf-8, pref UTF-8
[debug] yt-dlp version 2022.04.08 [7884ade65] (zip)
[debug] Python version 3.10.4 (CPython 64bit) - Linux-5.15.32-1-lts-x86_64-with-glibc2.35
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
[debug] exe versions: ffmpeg 5.0 (setts), ffprobe 5.0, phantomjs 2.1.1, rtmpdump 2.4
[debug] Optional libraries: mutagen, sqlite, websockets
[debug] Proxy map: {}
Latest version: 2022.04.08, Current version: 2022.04.08
yt-dlp is up to date (2022.04.08)
[debug] [generic] Extracting URL: https://streamwo.com/v/gp445h2f
[generic] gp445h2f: Requesting header
WARNING: [generic] Falling back on generic information extractor.
[generic] gp445h2f: Downloading webpage
[generic] gp445h2f: Extracting information
[debug] Looking for video embeds
[debug] Identified a HTML5 media
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[download] Downloading playlist: Streamwo
[generic] playlist Streamwo: Collected 1 videos; downloading 1 of them
[download] Downloading video 1 of 1
[debug] Default format spec: bestvideo*+bestaudio/best
[info] gp445h2f: Downloading 1 format(s): 0
[debug] Invoking downloader on "https://reoa92d.com/../uploaded/1649416469.mp4#t=0.1"
ERROR: unable to download video data: HTTP Error 400: Bad Request
Traceback (most recent call last):
File "/home/koonix/./yt-dlp/yt_dlp/YoutubeDL.py", line 3138, in process_info
success, real_download = self.dl(temp_filename, info_dict)
File "/home/koonix/./yt-dlp/yt_dlp/YoutubeDL.py", line 2846, in dl
return fd.download(name, new_info, subtitle)
File "/home/koonix/./yt-dlp/yt_dlp/downloader/common.py", line 457, in download
ret = self.real_download(filename, info_dict)
File "/home/koonix/./yt-dlp/yt_dlp/downloader/http.py", line 369, in real_download
establish_connection()
File "/home/koonix/./yt-dlp/yt_dlp/downloader/http.py", line 128, in establish_connection
ctx.data = self.ydl.urlopen(request)
File "/home/koonix/./yt-dlp/yt_dlp/YoutubeDL.py", line 3601, in urlopen
return self._opener.open(req, timeout=self._socket_timeout)
File "/usr/lib/python3.10/urllib/request.py", line 525, in open
response = meth(req, response)
File "/usr/lib/python3.10/urllib/request.py", line 634, in http_response
response = self.parent.error(
File "/usr/lib/python3.10/urllib/request.py", line 563, in error
return self._call_chain(*args)
File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/usr/lib/python3.10/urllib/request.py", line 643, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
[download] Finished downloading playlist: Streamwo
Issue Analytics
- State:
- Created a year ago
- Comments:8 (7 by maintainers)
Top Results From Across the Web
Issue 35748: urlparse library detecting wrong hostname leads ...
If I am reading this correctly: https://tools.ietf.org/html/rfc1738#section-3.1 the colon after the username can be omitted, so the URL is legal ...
Read more >Segmentation fault when looking up host name and IP address
After I added that header to my C program, it compiled and run fine. ... It also makes it easier to debug and...
Read more >Issue with ':' character in Bash script over SSH
Your source file has a colon in its name, so scp is trying to parse it as the hostname and filename of a...
Read more >Realm configuration decisions — MIT Kerberos Documentation
Before installing Kerberos V5, it is necessary to consider the following issues: The name of your Kerberos realm (or the name of each...
Read more >The Anatomy of a Full Path URL - By zvelo
Protocol, Domain, Hostname, Subdomain, Path, and more. ... After all, understanding how a URL is structured is an important step to ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
That would be appropriate since urllib/urllib2 is the source of the problem. Whenever I trace the code around opener stuff I get that old-style Adventure feeling: YOU ARE IN A MAZE OF TWISTY LITTLE PASSAGES, ALL ALIKE.
Requests knows how to handle
..
components.some more examples: https://datatracker.ietf.org/doc/html/rfc3986/#section-5.2.4
urljoin
has some basic support for this, but it won’t work in all situationshttps://stackoverflow.com/a/40536115
#3668 will technically resolve this for many
Edit: https://github.com/urllib3/urllib3/blob/314bc8ee91a728f51c2cf04b42353c7b2e12c76b/src/urllib3/util/url.py#L263-L290 is urllib3’s implementation. Based on the pseudo-code in the RFC linked above