question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`execute` lua script set proxy fail

See original GitHub issue

I try to set proxy in lua script with execute,below is my code:

def get_random_proxy():
    IPPOOL = eval(requests.get(
        "http://192.168.89.190:8000/?types=0&count=50&country=国内").text)
    random_choose = random.choice(IPPOOL)
    proxy_addr = "http://" + \
        str(random_choose[0]) + ":" + str(random_choose[1])

    return [str(random_choose[0]),random_choose[1]]


class Exp10itSpider(scrapy.Spider):
    name = "exp10it"
    collected_urls = []
    domain = ""
    start_url = ""
    a=get_random_proxy()
    # here print the proxy ip and port as a list
    print(a)
    lua_script = """
    function main(splash, args)
      assert(splash:go{splash.args.url,http_method=splash.args.http_method,body=splash.args.body})
      assert(splash:wait(0.5))

      splash:on_request(function(request)
          request:set_proxy{
              host = "%s",
              port = %d
          }
      end)

      return splash:html()
    end
    """ % (a[0],a[1])

    def start_requests(self):
        urls = [
            'http://httpbin.org/ip'
        ]
        self.domain = urlparse(urls[0]).hostname
        self.start_url = urls[0]
        for url in urls:
              yield SplashRequest(url, self.parse_get, endpoint='execute',
                                    magic_response=True, meta={'handle_httpstatus_all': True},
                                    args={'lua_source': self.lua_script})

below is the output,but the output shows me I didn’t set the proxy successfully,can you help me?

['101.53.101.172', 9999]
...

2017-11-20 17:49:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://httpbin.org/ip via http://192.168.89.190:8050/execute> (referer: None)
<html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{
  "origin": "115.174.68.89"
}
</pre></body></html>

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:11

github_iconTop GitHub Comments

4reactions
NullYingcommented, Jan 24, 2018
    lua_script = """
    function main(splash, args)
      assert(splash:go{splash.args.url,http_method=splash.args.http_method,body=splash.args.body})
      assert(splash:wait(0.5))

      splash:on_request(function(request)
          request:set_proxy{
              host = "http://%s",
              port = %d
          }
      end)

      return splash:html()
    end
    """ % (a[0],a[1])

host必须加上http://

0reactions
NullYingcommented, Aug 25, 2018

直接给你我的中间件吧

# Copyright (C) 2013 by Aivars Kalvans <aivars.kalvans@gmail.com>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.

import re
import random
import base64
import logging

log = logging.getLogger('scrapy.proxies')


class Mode:
    RANDOMIZE_PROXY_EVERY_REQUESTS, RANDOMIZE_PROXY_ONCE, SET_CUSTOM_PROXY = range(3)


class RandomProxy(object):
    def __init__(self, settings):
        self.mode = settings.get('PROXY_MODE')
        self.proxy_list = settings.get('PROXY_LIST')
        self.proxy_enable = settings.get('PROXY_ENABLE')
        self.error_code = settings.get('RETRY_HTTP_CODES')
        self.chosen_proxy = ''

        if not self.proxy_enable:
            return

        if self.mode == Mode.RANDOMIZE_PROXY_EVERY_REQUESTS or self.mode == Mode.RANDOMIZE_PROXY_ONCE:
            if self.proxy_list is None:
                raise KeyError('PROXY_LIST setting is missing')
            self.proxies = {}
            self.proxies_url = {}
            fin = open(self.proxy_list)
            try:
                for line in fin.readlines():
                    parts = re.match('(\w+://)([^:]+?:[^@]+?@)?(.+)', line.strip())
                    if not parts:
                        continue

                    # Cut trailing @
                    if parts.group(2):
                        user_pass = parts.group(2)[:-1]
                    else:
                        user_pass = ''
                    self.proxies_url[parts.group(1) + parts.group(3)] = line
                    self.proxies[parts.group(1) + parts.group(3)] = user_pass
            finally:
                fin.close()
            if self.mode == Mode.RANDOMIZE_PROXY_ONCE:
                self.chosen_proxy = random.choice(list(self.proxies.keys()))
        elif self.mode == Mode.SET_CUSTOM_PROXY:
            custom_proxy = settings.get('CUSTOM_PROXY')
            self.proxies = {}
            parts = re.match('(\w+://)([^:]+?:[^@]+?@)?(.+)', custom_proxy.strip())
            if not parts:
                raise ValueError('CUSTOM_PROXY is not well formatted')

            if parts.group(2):
                user_pass = parts.group(2)[:-1]
            else:
                user_pass = ''

            self.proxies[parts.group(1) + parts.group(3)] = user_pass
            self.chosen_proxy = parts.group(1) + parts.group(3)

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler.settings)

    def process_request(self, request, spider):
        if not self.proxy_enable:
            return
        splash = request.meta.get("splash")
        # Don't overwrite with a random one (server-side state for IP)
        if 'proxy' in request.meta:
            if request.meta["exception"] is False:
                return
        request.meta["exception"] = False
        if len(self.proxies) == 0:
            raise ValueError('All proxies are unusable, cannot proceed')

        if self.mode == Mode.RANDOMIZE_PROXY_EVERY_REQUESTS:
            proxy_address = random.choice(list(self.proxies.keys()))
        else:
            proxy_address = self.chosen_proxy

        proxy_user_pass = self.proxies[proxy_address]
        # splash使用代理
        if splash and splash['endpoint'] == 'render.html':
            splash['args']["proxy"] = self.proxies_url[proxy_address]
        else:
            request.meta['proxy'] = proxy_address

        if proxy_user_pass:
            basic_auth = 'Basic ' + base64.b64encode(proxy_user_pass.encode()).decode()
            # splash使用代理无需设置用户名密码
            if splash and splash['endpoint'] == 'render.html':
                pass
            else:
               request.headers['Proxy-Authorization'] = basic_auth
        else:
            log.debug('Proxy user pass not found')
        log.debug('Using proxy <%s>, %d proxies left' % (
                proxy_address, len(self.proxies)))
        return None

    def process_response(self, request, response, spider):
        if response.status in self.error_code:
            self.change_proxy(request)
        return response

    def process_exception(self, request, exception, spider):
        self.change_proxy(request)
        return request

    def change_proxy(self, request):
        if not self.proxy_enable:
            return
        if 'proxy' not in request.meta:
            return
        if self.mode == Mode.RANDOMIZE_PROXY_EVERY_REQUESTS or self.mode == Mode.RANDOMIZE_PROXY_ONCE:
            proxy = request.meta['proxy']
            try:
                if len(self.proxies) > 1:
                    del self.proxies[proxy]
                    del self.proxies_url[proxy]
                else:
                    log.warning('Only one proxy, Don\'t remove it')
            except KeyError:
                # 多线程机制可能导致移除失败
                pass
            request.meta["exception"] = True
            if self.mode == Mode.RANDOMIZE_PROXY_ONCE:
                proxy_list = list(self.proxies.keys())
                if len(proxy_list) != 0:
                    self.chosen_proxy = random.choice(list(self.proxies.keys()))
            log.info('Removing failed proxy <%s>, %d proxies left' % (
                proxy, len(self.proxies)))

Read more comments on GitHub >

github_iconTop Results From Across the Web

mysql-proxy not running lua script - Stack Overflow
I then select a database to use, and run a simple SELECT query on one of the tables. Based on multiple articles/tutorials I've...
Read more >
Unable to get Lua Scripts to work (MySQL Proxy 0.8.1-win32 ...
I get this error message: MySQL Proxy Lua script failed to execute. Check the error log. The error log states:
Read more >
Lua Scripting with Reverse Proxy - IBM
Reverse proxy supports dynamic evaluation of HTTP fallback requests using Lua scripting. Lua scripts can be used to allow, reject, or modify HTTP...
Read more >
Tair:Usage of Lua scripts - Alibaba Cloud
Lua scripts can be used to efficiently process check-and-set ... Executes a specified script that takes parameters and returns the output.
Read more >
Lua scripting - KrakenD API Gateway
Set this flag to true if you want to modify the Lua script while ... When running Lua scripts, you can place them...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found