Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add note that "allowed_domains" should be a list of domains, not URLs

See original GitHub issue

(just logging the issue before I forget)

It may seem obvious by the name of the attribute that allowed_domains is about domain names, but it’s not uncommon for scrapy users to make the mistake of doing allowed_domains = ['http://www.example.com']

I believe it is worth adding a note in http://doc.scrapy.org/en/latest/topics/spiders.html?#scrapy.spiders.Spider.allowed_domains

Issue Analytics

State:
Created 7 years ago
Reactions:2
Comments:11 (8 by maintainers)

Top GitHub Comments

1reaction

redapplecommented, Sep 15, 2016

+1 to issue a warning.

I’m less sure about inferring domain, for example for http://www.example.com, should it infer example.com or www.example.com?

1reaction

eliasdornelescommented, Sep 15, 2016

What if instead of simply documenting, Scrapy detect this case and issues a warning?

Even better, it could extract the domain from the URL and use that, while issuing a warning like:

logging.warn("allowed_domains accepts only domains, not URLs. Using allowed_domains = %r" % effective_allowed_domains)

Top Results From Across the Web

Dynamically add to allowed_domains in a Scrapy spider

I need to add more domains dynamically to this whitelist as the spidering continues from within a parser, but the following piece of...

Allowed Domains - Amazon AppStream 2.0

For AppStream 2.0 users to access streaming instances, you must allow the following domain on the network from which users initiate access to...

Restricting the Chat widget by country or domain - Zendesk help

Using the Widget Security settings in the dashboard, you can restrict what countries and domains can load the chat widget. Note: If you......

Difference between trusted domains, Allowed ... - SonicWall

Allowed domains allow access to URLs that are normally blocked by the SonicWall's Content Filter List (Categories). To allow access to a Web...

add a domain allow list [#3304340] | Drupal.org

Ideally we would have an allowed list (text area) with a list of allowed domains, or regex patterns to match against domains. If...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Add note that "allowed_domains" should be a list of domains, not URLs

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

`LxmlLinkExtractor` fails handling unicode netlocs in Python2

make offsite middleware downloader middleware instead of spider middleware