Needed a possibility to pass start_urls parameter in constructor
See original GitHub issueI have a Spider
that should get its start_urls
from an external source: file system, database, etc.
It will be extremely useful to pass it directly in constructor. Especially if I want to parse some similar sites on different URLs.
Issue Analytics
- State:
- Created 8 years ago
- Comments:11 (5 by maintainers)
Top Results From Across the Web
How to pass a user defined argument in scrapy spider
Spider arguments are passed in the crawl command using the -a option. For example: scrapy crawl myspider -a category=electronics -a domain= ...
Read more >Passing Information to a Method or a Constructor
Note: Parameters refers to the list of variables in a method declaration. Arguments are the actual values that are passed in when the...
Read more >Spiders — Scrapy 2.7.1 documentation
Spiders can receive arguments that modify their behaviour. Some common uses for spider arguments are to define the start URLs or to restrict ......
Read more >Entity types with constructors - EF Core - Microsoft Learn
It's possible to define a constructor with parameters and have EF Core call this constructor when creating an instance of the entity.
Read more >Spiders - Roach PHP
While using the $startUrls property is very convenient, it makes a few ... we can either explicitly pass null to the constructor, or...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@Felix-neko it should already work; in your example CrawlerProcess API is used incorrectly: process.crawl expects Spider class, not a spider instance. This works for me:
You can of course override your Spider’s
__init__()
method to pass any urls from elsewhere. This is rather a support question though, please see http://scrapy.org/community/ for help channels.As a quick starter, here’s an example:
run it as
scrapy crawl myspider -a "urls=http://localhost/test/,http://localhost"