Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scrapy as a library

See original GitHub issue

Scrapy currently assumes in a lot of it’s functionality to be used as a Framework.

Work has been done in the past, and is ongoing, to make it more usable as a library as well. I would like to see even more change in this regard.

Is it feasible to consider decoupling scrapy’s django-like concept of a “project” in the filesystem from “scrapy core” – possibly moving the behavior of creating a ‘scrapy project’ (and all that entails, such as file templates) into a separate project? Or the opposite of creating a libscrapy project for the bare essentials?

A lot of scrapy’s environment configuration is done through the Settings object. All configurable classes/functions that are defined through settings (anything that reads like scrapy.downloadermiddlewares.redirect.RedirectMiddleware), get loaded through load_object which in turn leverages importlib.import_module to find a module and pull in the code.

In a scripted environment we usually just want to load an already defined or imported class into scrapy, such as an ItemPipeline, and I would like to see support for loading objects directly in some manner there.

/edit: Being discussed in #1215 (thanks @kmike)

I hope this is not too “out there”. (And if someone has other scrapy-as-a-library-related considerations, they are welcome to add them here as well.

Issue Analytics

State:
Created 8 years ago
Reactions:2
Comments:8 (5 by maintainers)

Top GitHub Comments

1reaction

Digeniscommented, Feb 15, 2016

Nice to see this discussion pop-up.

Since scrapy is mostly useful for its scheduler/downloader + middlewares, I think using scrapy as a library needs something less abstract than “CrawlerProcess” but I can’t imagine how this would be anything else than twisted. Maybe using a crawler with its configured downloader and middleware stack without having to use a project and write spiders. I think #1349 can relate to this.

0reactions

sajattackcommented, Jun 12, 2016

Just the bare minimum that can be extended as the user sees fit.