how to get redirect urls with scrapy-splash
See original GitHub issueI don’t know how to get the redirect urls with scrapy-splash,can you help me? eg. http://xxx.xxx.xxx/1.php will redirect to http://xxx.xxx.xxx/index.php,how can I get http://xxx.xxx.xxx/index.php with scrapy-splash? Below is my code which can not get http://xxx.xxx.xxx/index.php but get http://xxx.xxx.xxx/1.php
def parse_get(self, response):
item = CrawlerItem()
item['code'] = response.status
item['current_url'] = response.url
############################# below print http://xxx.xxx.xxx/1.php
print(response.url)
self.lua_script = """
function main(splash, args)
assert(splash:go{splash.args.url,http_method=splash.args.http_method,body=splash.args.body,headers={
['Cookie']='%s',
}
}
)
assert(splash:wait(0.5))
splash:on_request(function(request)
request:set_proxy{
host = "%s",
port = %d
}
end)
return {cookies = splash:get_cookies(),html=splash:html()}
end
""" % (self.cookie,a[0],a[1])
url='http://xxx.xxx.xxx/1.php'
SplashRequest(url, self.parse_get, endpoint='execute', magic_response=True, meta={'handle_httpstatus_all': True}, args={'lua_source': self.lua_script})
Issue Analytics
- State:
- Created 6 years ago
- Comments:15 (6 by maintainers)
Top Results From Across the Web
Scrapy splash - why do I get a url redirection
I am new at scrapy and scrapy-splash. I have tried to make a very simple script : get a screenshot of a webpage....
Read more >Need to capture 302 redirects from Splash
We are interested in explicitly tracking HTTP 3xx redirects during our web scraping. An example URL that returns a 302 redirect in the...
Read more >Scrapy shell — Scrapy 2.7.1 documentation
fetch(url[, redirect=True]) - fetch a new response from the given URL and update all related objects accordingly. You can optionally ask for HTTP...
Read more >Requests and Responses — Scrapy 2.7.1 documentation
Both Request and Response classes have subclasses which add ... the URL before redirection) to be assigned to the redirected response (with ...
Read more >Release notes — Scrapy 2.7.1 documentation
LinkExtractor now also works as expected with links that have ... Finally, if you are a user of scrapy-splash, know that this version...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
So, is there any solution to see redirected url (the new one) inside scrapy-splash?
@3xp10it splash handles redirects by itself, so the result you are getting is from a page where it was redirected. To get it’s URL, you can add
url = splash:url()
to return values (see example in README below “Use a Lua script to get an HTML response with cookies, headers, body and method set to correct values”) - after that response.url should be from the redirected page.