question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong annotations in scrapy.crawler.CrawlProcess.crawl docstrings

See original GitHub issue

Description

Some IDEs recognize type annotations within docstring to make suggestions or raise warnings related to your code. In the Scrapy source, there are some annotations in the docstring that are wrong.

[Description of the issue] This is a minor issue. The scrapy.crawler.CrawlerProcess.crawl method expects one mandatory positional argument, crawler_or_spidercls, and optional positional and keyword arguments through arbitrary argument lists. The type annotations for the arbitrary argument lists are wrong: https://github.com/scrapy/scrapy/blob/3989f64baa39f7e42b0f798dec15cd250e0fba21/scrapy/crawler.py#L183-L185

When using list for args and dict for kwargs, the docstring suggests that the types of arguments passed through these lists should be list and dict respectively. According to PEP 484, the types of args and kwargs will be automatically deduced as Tuple[list, ...] and Dict[dict, int], so the specified types correspond to the values they contain. These annotations don’t match the intention you have with this method, do they? BTW, list and dict weren’t generic types for annotations until Python 3.9.

Versions

Scrapy       : 2.3.0
lxml         : 4.5.2.0
libxml2      : 2.9.10
cssselect    : 1.1.0
parsel       : 1.6.0
w3lib        : 1.22.0
Twisted      : 20.3.0
Python       : 3.8.5 | packaged by conda-forge | (default, Sep 16 2020, 17:43:11) - [Clang 10.0.1 ]
pyOpenSSL    : 19.1.0 (OpenSSL 1.1.1g  21 Apr 2020)
cryptography : 3.1
Platform     : macOS-10.14.6-x86_64-i386-64bit

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
elacuestacommented, Sep 23, 2020

Makes sense. For the record, we recently added a typing check in the CI process, so annotating the actual function definition is a possibility. Feel free to open a PR if you’d like.

1reaction
sashreek1commented, Sep 30, 2020

As far as I am aware, I don’t think we need to specify the data types for *args or **kwargs. So, would a fix that completely eliminates the specification of the data types (list and dict) be fine ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

scrapy crawler showing error while crawling - Stack Overflow
The ERROR: Unable to read instance data, giving, tells you didn't receive any Data from the given URL. Maybe you are Blacklisted. Comment:...
Read more >
Common Practices — Scrapy 2.7.1 documentation
The first utility you can use to run your spiders is scrapy.crawler. ... process.start() # the script will block here until the crawling...
Read more >
Frequently Asked Questions — Scrapy 2.7.1 documentation
Scrapy is an application framework for writing web spiders that crawl web sites and extract data from them. Scrapy provides a built-in ...
Read more >
Core API — Scrapy 2.7.1 documentation
Starts the crawler by instantiating its spider class with the given args and kwargs arguments, while setting the execution engine in motion.
Read more >
Spiders — Scrapy 2.7.1 documentation
Spiders are classes which define how a certain site (or a group of sites) will be scraped, including how to perform the crawl...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found