question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

how to get redirect urls with scrapy-splash

See original GitHub issue

I don’t know how to get the redirect urls with scrapy-splash,can you help me? eg. http://xxx.xxx.xxx/1.php will redirect to http://xxx.xxx.xxx/index.php,how can I get http://xxx.xxx.xxx/index.php with scrapy-splash? Below is my code which can not get http://xxx.xxx.xxx/index.php but get http://xxx.xxx.xxx/1.php

    def parse_get(self, response):
        item = CrawlerItem()
        item['code'] = response.status
        item['current_url'] = response.url
        ############################# below print http://xxx.xxx.xxx/1.php
        print(response.url)


self.lua_script = """
        function main(splash, args)
          assert(splash:go{splash.args.url,http_method=splash.args.http_method,body=splash.args.body,headers={
              ['Cookie']='%s',
              }
              }
              )
          assert(splash:wait(0.5))

          splash:on_request(function(request)
              request:set_proxy{
                  host = "%s",
                  port = %d
              }
          end)

          return {cookies = splash:get_cookies(),html=splash:html()}
        end
        """ % (self.cookie,a[0],a[1])

url='http://xxx.xxx.xxx/1.php'
SplashRequest(url, self.parse_get, endpoint='execute', magic_response=True, meta={'handle_httpstatus_all': True}, args={'lua_source': self.lua_script})


Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:15 (6 by maintainers)

github_iconTop GitHub Comments

3reactions
civanescucommented, Jan 22, 2019

So, is there any solution to see redirected url (the new one) inside scrapy-splash?

2reactions
lopuhincommented, Nov 29, 2017

@3xp10it splash handles redirects by itself, so the result you are getting is from a page where it was redirected. To get it’s URL, you can add url = splash:url() to return values (see example in README below “Use a Lua script to get an HTML response with cookies, headers, body and method set to correct values”) - after that response.url should be from the redirected page.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Scrapy splash - why do I get a url redirection
I am new at scrapy and scrapy-splash. I have tried to make a very simple script : get a screenshot of a webpage....
Read more >
Need to capture 302 redirects from Splash
We are interested in explicitly tracking HTTP 3xx redirects during our web scraping. An example URL that returns a 302 redirect in the...
Read more >
Scrapy shell — Scrapy 2.7.1 documentation
fetch(url[, redirect=True]) - fetch a new response from the given URL and update all related objects accordingly. You can optionally ask for HTTP...
Read more >
Requests and Responses — Scrapy 2.7.1 documentation
Both Request and Response classes have subclasses which add ... the URL before redirection) to be assigned to the redirected response (with ...
Read more >
Release notes — Scrapy 2.7.1 documentation
LinkExtractor now also works as expected with links that have ... Finally, if you are a user of scrapy-splash, know that this version...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found