Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

problem with double-dot segments (`/../`) after the hostname

See original GitHub issue

Checklist

I’m reporting a bug unrelated to a specific site
I’ve verified that I’m running yt-dlp version 2022.04.08 (update instructions) or later (specify commit)
I’ve checked that all provided URLs are alive and playable in a browser
I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
I’ve searched the bugtracker for similar issues including closed ones. DO NOT post duplicates
I’ve read the guidelines for opening an issue

Description

Some URLs have a double-dot section after the hostname, which causes problems in yt-dlp.

Example: https://streamwo.com/v/gp445h2f if we resolve this URL we get this:

$ yt-dlp --get-url https://streamwo.com/v/gp445h2f 
https://reoa92d.com/../uploaded/1649416469.mp4#t=0.1

Which has a ../ segment right after the hostname. Opening this result in browsers, or downloading it using curl is no problem:

$ curl -O https://reoa92d.com/../uploaded/1649416469.mp4
...
Succeeds

But yt-dlp fails:

$ yt-dlp https://streamwo.com/v/gp445h2f 
[generic] gp445h2f: Requesting header
WARNING: [generic] Falling back on generic information extractor.
[generic] gp445h2f: Downloading webpage
[generic] gp445h2f: Extracting information
[download] Downloading playlist: Streamwo
[generic] playlist Streamwo: Collected 1 videos; downloading 1 of them
[download] Downloading video 1 of 1
[info] gp445h2f: Downloading 1 format(s): 0
ERROR: unable to download video data: HTTP Error 400: Bad Request
[download] Finished downloading playlist: Streamwo

mpv (which uses yt-dlp in it’s ytdl_hook) fails as well:

$ mpv https://streamwo.com/v/gp445h2f                                
[ffmpeg] https: HTTP error 400 Bad Request
Failed to open https://reoa92d.com/../uploaded/1649416469.mp4#t=0.1.

Exiting... (Errors when loading file)

Verbose log

$ yt-dlp -vU https://streamwo.com/v/gp445h2f 
[debug] Command-line config: ['-vU', 'https://streamwo.com/v/gp445h2f']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, err utf-8, pref UTF-8
[debug] yt-dlp version 2022.04.08 [7884ade65] (zip)
[debug] Python version 3.10.4 (CPython 64bit) - Linux-5.15.32-1-lts-x86_64-with-glibc2.35
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
[debug] exe versions: ffmpeg 5.0 (setts), ffprobe 5.0, phantomjs 2.1.1, rtmpdump 2.4
[debug] Optional libraries: mutagen, sqlite, websockets
[debug] Proxy map: {}
Latest version: 2022.04.08, Current version: 2022.04.08
yt-dlp is up to date (2022.04.08)
[debug] [generic] Extracting URL: https://streamwo.com/v/gp445h2f
[generic] gp445h2f: Requesting header
WARNING: [generic] Falling back on generic information extractor.
[generic] gp445h2f: Downloading webpage
[generic] gp445h2f: Extracting information
[debug] Looking for video embeds
[debug] Identified a HTML5 media
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[download] Downloading playlist: Streamwo
[generic] playlist Streamwo: Collected 1 videos; downloading 1 of them
[download] Downloading video 1 of 1
[debug] Default format spec: bestvideo*+bestaudio/best
[info] gp445h2f: Downloading 1 format(s): 0
[debug] Invoking downloader on "https://reoa92d.com/../uploaded/1649416469.mp4#t=0.1"
ERROR: unable to download video data: HTTP Error 400: Bad Request
Traceback (most recent call last):
  File "/home/koonix/./yt-dlp/yt_dlp/YoutubeDL.py", line 3138, in process_info
    success, real_download = self.dl(temp_filename, info_dict)
  File "/home/koonix/./yt-dlp/yt_dlp/YoutubeDL.py", line 2846, in dl
    return fd.download(name, new_info, subtitle)
  File "/home/koonix/./yt-dlp/yt_dlp/downloader/common.py", line 457, in download
    ret = self.real_download(filename, info_dict)
  File "/home/koonix/./yt-dlp/yt_dlp/downloader/http.py", line 369, in real_download
    establish_connection()
  File "/home/koonix/./yt-dlp/yt_dlp/downloader/http.py", line 128, in establish_connection
    ctx.data = self.ydl.urlopen(request)
  File "/home/koonix/./yt-dlp/yt_dlp/YoutubeDL.py", line 3601, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/usr/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

[download] Finished downloading playlist: Streamwo

Issue Analytics

State:
Created a year ago
Comments:8 (7 by maintainers)

Top GitHub Comments

2reactions

dirkfcommented, Apr 9, 2022

… it’s better to handle this directly in the url_opener

That would be appropriate since urllib/urllib2 is the source of the problem. Whenever I trace the code around opener stuff I get that old-style Adventure feeling: YOU ARE IN A MAZE OF TWISTY LITTLE PASSAGES, ALL ALIKE.

Requests knows how to handle .. components.

0reactions

coletdjnzcommented, Jun 22, 2022

This is what’s in the webpage:
                    <video id="my-video" class="video-js vjs-16-9 vjs-big-play-centered" loop controls playsinline preload="auto" data-setup="{}" > 
                        <source src="https://reoa92d.com/../uploaded/1649416469.mp4#t=0.1" type="video/mp4" /> 
                    </video>
This apparently [1] invalid URL should be corrected to https://reoa92d.com/uploaded/1649416469.mp4#t=0.1, which Mozilla does. But compat_urllib_request.Request() doesn’t. The URL specification says that a .. component should shorten the so far parsed URL, which means doing nothing when that URL has no path, as would be the case here.

We could fix such a URL in the core processing of the extracted info_dict (sanitise_url(), eg); alternatively it could be fixed in before opening for download (sanitized_Request(), eg).

urllib.parse.urlparse() doesn’t implement the URL parsing algorithm as specified, even when there is a path component before the ..:
$ python3.9
Python 3.9.7 (default, Sep  4 2021, 18:19:10) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.parse as urlparse
>>> urlparse.urlparse('http://som.dom.com/path/../no/this/path')
ParseResult(scheme='http', netloc='som.dom.com', path='/path/../no/this/path', params='', query='', fragment='')
>>>
1. Only apparently because the WhatWG specs essentially make all invalid Web constructs valid for backward compatibility (quirks).