Add note that "allowed_domains" should be a list of domains, not URLs
See original GitHub issue(just logging the issue before I forget)
It may seem obvious by the name of the attribute that allowed_domains
is about domain names, but it’s not uncommon for scrapy users to make the mistake of doing allowed_domains = ['http://www.example.com']
I believe it is worth adding a note in http://doc.scrapy.org/en/latest/topics/spiders.html?#scrapy.spiders.Spider.allowed_domains
Issue Analytics
- State:
- Created 7 years ago
- Reactions:2
- Comments:11 (8 by maintainers)
Top Results From Across the Web
Dynamically add to allowed_domains in a Scrapy spider
I need to add more domains dynamically to this whitelist as the spidering continues from within a parser, but the following piece of...
Read more >Allowed Domains - Amazon AppStream 2.0
For AppStream 2.0 users to access streaming instances, you must allow the following domain on the network from which users initiate access to...
Read more >Restricting the Chat widget by country or domain - Zendesk help
Using the Widget Security settings in the dashboard, you can restrict what countries and domains can load the chat widget. Note: If you......
Read more >Difference between trusted domains, Allowed ... - SonicWall
Allowed domains allow access to URLs that are normally blocked by the SonicWall's Content Filter List (Categories). To allow access to a Web...
Read more >add a domain allow list [#3304340] | Drupal.org
Ideally we would have an allowed list (text area) with a list of allowed domains, or regex patterns to match against domains. If...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
+1 to issue a warning.
I’m less sure about inferring domain, for example for
http://www.example.com
, should it inferexample.com
orwww.example.com
?What if instead of simply documenting, Scrapy detect this case and issues a warning?
Even better, it could extract the domain from the URL and use that, while issuing a warning like: