Don't require 'name' attribute for scrapy.Spider
See original GitHub issueI think we should make Spider.name attribute optional. The name is used by SpiderManager to find spiders, but Spider can be used without a Scrapy project. It is unnecessary boilerplate for users of runspider command or for CrawlerRunner / CrawlerProcess users.
We can also provide a default value, e.g. self.__class__.__name__
, to help with discovery; this have an advantage of 1-to-1 mapping between spider class names and names printed to users - spiders can become easier to find.
Opinions?
Issue Analytics
- State:
- Created 8 years ago
- Reactions:1
- Comments:12 (11 by maintainers)
Top Results From Across the Web
Spiders — Scrapy 2.7.1 documentation
The spider name is how the spider is located (and instantiated) by Scrapy ... This is the most important spider attribute and it's...
Read more >Scrapy spider not found error - Stack Overflow
Name attribute in CrawlSpider class defines the spider name and this name is used in command line for calling the spider to work....
Read more >Web scraping using Python and Scrapy - GitHub Pages
We just need to replace <SCRAPER NAME> with the name we want to give our spider and ... Don't include http:// when running...
Read more >Scrapy - Selectors - GeeksforGeeks
Next we can use our selectors with the regular expression also. If we don't know what is the name of the attributes or...
Read more >Use Scrapy to Extract Data From HTML Tags - Linode
Start your spider with: `scrapy crawl`. The Spider registers itself in Scrapy with its name that is defined in the name attribute of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
If and when this is done, please take into account that Spidermon currently uses spider names, for example to generate unique, spider-specific filenames for storing data in disk. You can search for
spider.name
there to find some of those usages.Any change in this direction in Scrapy should probably be accompanied by the corresponding change in Spidermon.
Is there actually any greater value in this Spider-class
name
attribute? Does it make sense to put this emphasis on it, instead of simply using the Spider-class name itself? If, as @kmike says, there is onlySpiderLoader
using it, that code looks straightforward enough to substitute looking up the Spider-class name instead (or simply pulling in all[Base]Spider
subclasses).The only interesting case here seems to be having different classes with the same name attribute? What’s the defined behavior there, and could it be replicated by a user using python class inheritance rules instead? In which case I would downgrade this into an extension/addon feature, for whom may care, instead of trading in more magic for less importance 😉