urllib3 does not urlencode redirect targets on redirect.
See original GitHub issueI am crawling pages and I’m getting UnicodeEncodeError
.
The problematic url is : http://www.efind.co.il/detailed/52133.html
I’ve isolated my code to this:
url = "http://www.efind.co.il/detailed/52133.html" manager = PoolManager() response = manager.request('GET', url)
This url is redirecting to this url: articles.efind.co.il/info/דגי-נוי-בבריכה-מאמר which is not ‘ascii’.
Because this is auto redirect i can’t do anything about it. I am quoting the first url and giving it to the urllib, I don’t have control over the redirected urls.
Anything i can do?
Issue Analytics
- State:
- Created 7 years ago
- Comments:9 (7 by maintainers)
Top Results From Across the Web
urllib2: submitting a form and then redirecting - Stack Overflow
So, I POST something to an URL, the server sets a cookie in the response and doesn't redirect. Now, I set the very...
Read more >urllib3 Documentation
By default, urllib3 will retry requests 3 times and follow up to. 3 redirects. To change the number of retries just specify an...
Read more >#2320 (Inconsistent URL decoding with X-Accel-Redirect ...
This is a problem of course, because it means I cannot reliable know whether I need to URL-encode the path included in my...
Read more >urlencode - Manual - PHP
This function is convenient when encoding a string to be used in a query part of a URL, as a convenient way to...
Read more >urllib.parse — Parse URLs into components — Python 3.11.1 ...
The components are not broken up into smaller parts (for example, ... Use the urllib.parse.urlencode() function (with the doseq parameter set to True...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
So this looks potentially like it’s a bit of a urllib3 bug. It’s passing the redirect URL straight back to httplib, but it needs to be urlencoded first, which is apparently not something urllib3 has ever done.
So yes, this is a bug.
I am totally up for urllib3 using rfc3986.