question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Needed a possibility to pass start_urls parameter in constructor

See original GitHub issue

I have a Spider that should get its start_urls from an external source: file system, database, etc. It will be extremely useful to pass it directly in constructor. Especially if I want to parse some similar sites on different URLs.

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

25reactions
kmikecommented, Feb 27, 2016

@Felix-neko it should already work; in your example CrawlerProcess API is used incorrectly: process.crawl expects Spider class, not a spider instance. This works for me:

import scrapy
from scrapy.crawler import CrawlerProcess

class HabraSpider(scrapy.Spider):
    name = 'habra'

    def parse(self, response):
        print(response.url)

process = CrawlerProcess()
process.crawl(HabraSpider, start_urls=["http://example.com", "http://example.org"])
process.start()
20reactions
nyovcommented, Feb 27, 2016

You can of course override your Spider’s __init__() method to pass any urls from elsewhere. This is rather a support question though, please see http://scrapy.org/community/ for help channels.

As a quick starter, here’s an example:

class MySpider(scrapy.Spider):

    name = "myspider"

    def __init__(self, *args, **kwargs):
        urls = kwargs.pop('urls', []) 
        if urls:
            self.start_urls = urls.split(',')
        self.logger.info(self.start_urls)
        super(MySpider, self).__init__(*args, **kwargs)

run it as scrapy crawl myspider -a "urls=http://localhost/test/,http://localhost"

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to pass a user defined argument in scrapy spider
Spider arguments are passed in the crawl command using the -a option. For example: scrapy crawl myspider -a category=electronics -a domain= ...
Read more >
Passing Information to a Method or a Constructor
Note: Parameters refers to the list of variables in a method declaration. Arguments are the actual values that are passed in when the...
Read more >
Spiders — Scrapy 2.7.1 documentation
Spiders can receive arguments that modify their behaviour. Some common uses for spider arguments are to define the start URLs or to restrict ......
Read more >
Entity types with constructors - EF Core - Microsoft Learn
It's possible to define a constructor with parameters and have EF Core call this constructor when creating an instance of the entity.
Read more >
Spiders - Roach PHP
While using the $startUrls property is very convenient, it makes a few ... we can either explicitly pass null to the constructor, or...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found