question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Adding Type Hints to Scrapy and its Modules

See original GitHub issue

Summary

We should add Variable Annotations/ Type hints as supported in PEP 526 , Python 3.6 to Scrapy to help out existing and new contributors and developers in understanding scrapy code.

Motivation

Intellisense enabled IDES like PyCharm need Type hints to provide better experience. For new contributors to understand Scrapy comprehensively, type hints are vital.

Consider someone not that familiar with scrapy, stumbling upon scheduler’s constructor.

    def __init__(self, dupefilter, jobdir=None, dqclass=None, mqclass=None,
                 logunser=False, stats=None, pqclass=None, crawler=None):
        self.df = dupefilter
        self.dqdir = self._dqdir(jobdir)
        self.pqclass = pqclass
        self.dqclass = dqclass
        self.mqclass = mqclass
        self.logunser = logunser
        self.stats = stats
        self.crawler = crawler

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:9
  • Comments:20 (11 by maintainers)

github_iconTop GitHub Comments

3reactions
synodrivercommented, Jan 3, 2021

Well I think pydantic would be useful if type hints could be added to scrapy in future versions. It could help validate types, also make it easier to read configuration files and environment variables. After adding these settings, the code will be easier to understand. For example, Item can be defined as subclasses of pydantic.BaseModel , making it more convenient to use, and settings can be defined as subclasses of pydantic.BaseSettings

2reactions
grammy-jiangcommented, Dec 16, 2019

Considering there are already hundreds and thousands of lines code in Scrapy, Monkeytype may be a good choice to automatically add typing hint for existing code:

MonkeyType collects runtime types of function arguments and return values, and can automatically generate stub files or even add draft type annotations directly to your Python code based on the types collected at runtime.

Before each data type is added manually, I think this can give some help for typing hint.

I have tested MonkeyType, and get the following conclusions:

  • MonkeyType ONLY add typing hint to the input (arguments) and output (return) of the methods which are tested in the test cases - no variables inside methods can be typing hint
  • Because the input (arguments) may be mocked in the test cases, the typing hint may include this mocked objects’ types, which is not accurate and the imports is also polluted, e.g.:
...
from unittest.mock import MagicMock
...
class RobotsTxtMiddleware(object):
    DOWNLOAD_PRIORITY = 1000

    def __init__(self, crawler: Union[Crawler, MagicMock]) -> None:
	...
  • The from_crawler classmethod will be added a typing hint for returning an instance of the class, Monkeytype can’t add from __future__ import annotations automatically
  • Some imports will be added for typing hint, but this may cause some duplications - Monkeytype will only import from the very detail module path and ignore the imports already imported, e.g.:
...
from scrapy.http import Request  # original imports
...
from scrapy.http.request import Request  # added by MonkeyType
...

It seems Monkeytype can help for typing hint in some way, but manually checking the result is still necessary.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Scrapy Tutorial — Scrapy 2.7.1 documentation
Scrapy Tutorial¶. In this tutorial, we'll assume that Scrapy is already installed on your system. If that's not the case, see Installation ...
Read more >
Settings — Scrapy 2.7.1 documentation
The Scrapy settings allows you to customize the behaviour of all Scrapy components, including the core, extensions, pipelines and spiders ...
Read more >
Common Practices — Scrapy 2.7.1 documentation
Common Practices¶. This section documents common practices when using Scrapy. These are things that cover many topics and don't often fall into any...
Read more >
Items — Scrapy 2.7.1 documentation
Scrapy supports the following types of items, via the itemadapter library: dictionaries, Item objects, dataclass objects, and attrs objects.
Read more >
Installation guide — Scrapy 2.7.1 documentation
Scrapy requires Python 3.7+, either the CPython implementation (default) or the PyPy implementation (see Alternate Implementations). Installing ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found