question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scrapy as a library

See original GitHub issue

Scrapy currently assumes in a lot of it’s functionality to be used as a Framework.

Work has been done in the past, and is ongoing, to make it more usable as a library as well. I would like to see even more change in this regard.

Is it feasible to consider decoupling scrapy’s django-like concept of a “project” in the filesystem from “scrapy core” – possibly moving the behavior of creating a ‘scrapy project’ (and all that entails, such as file templates) into a separate project? Or the opposite of creating a libscrapy project for the bare essentials?

A lot of scrapy’s environment configuration is done through the Settings object. All configurable classes/functions that are defined through settings (anything that reads like scrapy.downloadermiddlewares.redirect.RedirectMiddleware), get loaded through load_object which in turn leverages importlib.import_module to find a module and pull in the code.

In a scripted environment we usually just want to load an already defined or imported class into scrapy, such as an ItemPipeline, and I would like to see support for loading objects directly in some manner there.

/edit: Being discussed in #1215 (thanks @kmike)

I hope this is not too “out there”. (And if someone has other scrapy-as-a-library-related considerations, they are welcome to add them here as well.

Issue Analytics

  • State:open
  • Created 8 years ago
  • Reactions:2
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
Digeniscommented, Feb 15, 2016

Nice to see this discussion pop-up.

Since scrapy is mostly useful for its scheduler/downloader + middlewares, I think using scrapy as a library needs something less abstract than “CrawlerProcess” but I can’t imagine how this would be anything else than twisted. Maybe using a crawler with its configured downloader and middleware stack without having to use a project and write spiders. I think #1349 can relate to this.

0reactions
sajattackcommented, Jun 12, 2016

Just the bare minimum that can be extended as the user sees fit.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Scrapy | A Fast and Powerful Scraping and Web Crawling ...
An open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.
Read more >
Web Scraping in Python using Scrapy (with multiple examples)
Scrapy is a Python framework for large scale web scraping. It gives you all the tools you need to efficiently extract data from...
Read more >
5 Tasty Python Web Scraping Libraries - EliteDataScience
The Farm: Requests · The Stew: Beautiful Soup 4 · The Salad: lxml · The Restaurant: Selenium · The Chef: Scrapy ...
Read more >
Choose the Best Python Web Scraping Library for Your ...
Scrapy is one of the most popular Python web scrapping libraries right now. It is an open-source framework. This means it is not...
Read more >
Scrapy - PyPI
Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found