Quoted url Location header fails to redirect
See original GitHub issueHello,
I’m trying to crawl a website that returns a Location url in the header that is quoted. When that happens the downloadermiddleware_redirect fails to build the correct url.
This little test helps to explain:
def test_quoted_location(self):
req = Request('http://scrapytest.org/first')
utf8_location = u'http%3A//scrapytest.org/ação'.encode('utf-8') # header using quoted UTF-8 encoding
resp = Response('http://scrapytest.org/first', headers={'Location': utf8_location}, status=302)
req_result = self.mw.process_response(req, resp, self.spider)
perc_encoded_utf8_url = 'http://scrapytest.org/a%C3%A7%C3%A3o'
self.assertEquals(perc_encoded_utf8_url, req_result.url)
it fails:
===================================================================== FAILURES =====================================================================
___________________________________________________ RedirectMiddlewareTest.test_quoted_location ____________________________________________________
self = <tests.test_downloadermiddleware_redirect.RedirectMiddlewareTest testMethod=test_quoted_location>
def test_quoted_location(self):
req = Request('http://scrapytest.org/first')
utf8_location = u'http%3A//scrapytest.org/ação'.encode('utf-8') # header using UTF-8 encoding
resp = Response('http://scrapytest.org/first', headers={'Location': utf8_location}, status=302)
req_result = self.mw.process_response(req, resp, self.spider)
perc_encoded_utf8_url = 'http://scrapytest.org/a%C3%A7%C3%A3o'
> self.assertEquals(perc_encoded_utf8_url, req_result.url)
E AssertionError: 'http://scrapytest.org/a%C3%A7%C3%A3o' != 'http://scrapytest.org/http%3A//scrapytest.org/a%C3%A7%C3%A3o'
/home/aurumdev/repos/scrapy/tests/test_downloadermiddleware_redirect.py:177: AssertionError
======================================================= 1 failed, 19 passed in 0.29 seconds ========================================================
The quoted url comes from the website.
Is that a bug?
Thank you
Issue Analytics
- State:
- Created 7 years ago
- Comments:12 (7 by maintainers)
Top Results From Across the Web
header("Location: ".$url) having troubles with long urls
The issue is that when I'm trying to redirect the user to the steam openid page it does not work and redirects them...
Read more >302 Found - HTTP - MDN Web Docs
The HyperText Transfer Protocol (HTTP) 302 Found redirect status response code indicates that the resource requested has been temporarily moved ...
Read more >What is the HTTP 307 Temporary Redirect Status Code - Kinsta
The 307 status code indicates that the target resource resides temporarily under a different URI. Find out more with this in-depth guide.
Read more >urllib.request — Extensible library for opening URLs — Python ...
urllib.request module uses HTTP/1.1 and includes Connection:close header in its HTTP ... Add a header that will not be added to a redirected...
Read more >The Authorization Response - OAuth 2.0 Simplified
The parameters to be added to the query string of the redirect URL are as ... If the request contained a state parameter,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@Tarliton , I’m closing this issue as you found a workaround for the server sending invalid referer values.
sorry, I was wrong at the first post. I used:
instead of:
this:
is different than this:
although, I still can’t explain what’s going on on that website.