question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Quoted url Location header fails to redirect

See original GitHub issue

Hello,

I’m trying to crawl a website that returns a Location url in the header that is quoted. When that happens the downloadermiddleware_redirect fails to build the correct url.

This little test helps to explain:

    def test_quoted_location(self):
        req = Request('http://scrapytest.org/first')
        utf8_location = u'http%3A//scrapytest.org/ação'.encode('utf-8')  # header using quoted UTF-8 encoding
        resp = Response('http://scrapytest.org/first', headers={'Location': utf8_location}, status=302)
        req_result = self.mw.process_response(req, resp, self.spider)
        perc_encoded_utf8_url = 'http://scrapytest.org/a%C3%A7%C3%A3o'
        self.assertEquals(perc_encoded_utf8_url, req_result.url)

it fails:

===================================================================== FAILURES =====================================================================
___________________________________________________ RedirectMiddlewareTest.test_quoted_location ____________________________________________________

self = <tests.test_downloadermiddleware_redirect.RedirectMiddlewareTest testMethod=test_quoted_location>

    def test_quoted_location(self):
        req = Request('http://scrapytest.org/first')
        utf8_location = u'http%3A//scrapytest.org/ação'.encode('utf-8')  # header using UTF-8 encoding
        resp = Response('http://scrapytest.org/first', headers={'Location': utf8_location}, status=302)
        req_result = self.mw.process_response(req, resp, self.spider)
        perc_encoded_utf8_url = 'http://scrapytest.org/a%C3%A7%C3%A3o'
>       self.assertEquals(perc_encoded_utf8_url, req_result.url)
E       AssertionError: 'http://scrapytest.org/a%C3%A7%C3%A3o' != 'http://scrapytest.org/http%3A//scrapytest.org/a%C3%A7%C3%A3o'

/home/aurumdev/repos/scrapy/tests/test_downloadermiddleware_redirect.py:177: AssertionError
======================================================= 1 failed, 19 passed in 0.29 seconds ========================================================

The quoted url comes from the website.

Is that a bug?

Thank you

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:12 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
redapplecommented, Nov 2, 2016

@Tarliton , I’m closing this issue as you found a workaround for the server sending invalid referer values.

0reactions
Tarlitoncommented, Nov 2, 2016

sorry, I was wrong at the first post. I used:

urllib.quote('http://scrapytest.org/ação')
'http%3A//scrapytest.org/a%C3%A7%C3%A3o'

instead of:

urllib.urlencode({'url':'http://scrapytest.org/ação'})
'url=http%3A%2F%2Fscrapytest.org%2Fa%C3%A7%C3%A3o'

this:

http%3A%2F%2Fscrapytest.org%2Fa%C3%A7%C3%A3o

is different than this:

http%253A%2F%2Fscrapytest.org%2Fa%25C3%25A7%25C3%25A3o

although, I still can’t explain what’s going on on that website.

Read more comments on GitHub >

github_iconTop Results From Across the Web

header("Location: ".$url) having troubles with long urls
The issue is that when I'm trying to redirect the user to the steam openid page it does not work and redirects them...
Read more >
302 Found - HTTP - MDN Web Docs
The HyperText Transfer Protocol (HTTP) 302 Found redirect status response code indicates that the resource requested has been temporarily moved ...
Read more >
What is the HTTP 307 Temporary Redirect Status Code - Kinsta
The 307 status code indicates that the target resource resides temporarily under a different URI. Find out more with this in-depth guide.
Read more >
urllib.request — Extensible library for opening URLs — Python ...
urllib.request module uses HTTP/1.1 and includes Connection:close header in its HTTP ... Add a header that will not be added to a redirected...
Read more >
The Authorization Response - OAuth 2.0 Simplified
The parameters to be added to the query string of the redirect URL are as ... If the request contained a state parameter,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found