Scrapy as a library
See original GitHub issueScrapy currently assumes in a lot of it’s functionality to be used as a Framework.
Work has been done in the past, and is ongoing, to make it more usable as a library as well. I would like to see even more change in this regard.
Is it feasible to consider decoupling scrapy’s django-like concept of a “project” in the filesystem from “scrapy core” – possibly moving the behavior of creating a ‘scrapy project’ (and all that entails, such as file templates) into a separate project? Or the opposite of creating a libscrapy
project for the bare essentials?
A lot of scrapy’s environment configuration is done through the Settings
object.
All configurable classes/functions that are defined through settings (anything that reads like scrapy.downloadermiddlewares.redirect.RedirectMiddleware
), get loaded through load_object
which in turn leverages importlib.import_module
to find a module and pull in the code.
In a scripted environment we usually just want to load an already defined or imported class into scrapy, such as an ItemPipeline
, and I would like to see support for loading objects directly in some manner there.
/edit: Being discussed in #1215 (thanks @kmike)
I hope this is not too “out there”.
(And if someone has other scrapy-as-a-library
-related considerations, they are welcome to add them here as well.
Issue Analytics
- State:
- Created 8 years ago
- Reactions:2
- Comments:8 (5 by maintainers)
Top GitHub Comments
Nice to see this discussion pop-up.
Since scrapy is mostly useful for its scheduler/downloader + middlewares, I think using scrapy as a library needs something less abstract than “CrawlerProcess” but I can’t imagine how this would be anything else than twisted. Maybe using a crawler with its configured downloader and middleware stack without having to use a project and write spiders. I think #1349 can relate to this.
Just the bare minimum that can be extended as the user sees fit.