question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CrawlerProcess doesn't load Item Pipeline component

See original GitHub issue

If I using scrapy crawl spider_name , everything is fun. BUT When I using CrawlerProcess to wrote my spider, I found CrawlerProcess doesn’t load Item Pipeline component !

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:14 (6 by maintainers)

github_iconTop GitHub Comments

33reactions
swizzardcommented, Apr 4, 2016

@zouge if you’re using CrawlerProcess outside the ‘normal’ command-line process, you have to load in your settings yourself:

from scrapy.crawler import CrawlerProcess
from scrapy.settings import Settings

from my_project import settings as my_settings

crawler_settings = Settings()
crawler_settings.setmodule(my_settings)
process = CrawlerProcess(settings=crawler_settings)
8reactions
Gallaeciocommented, Sep 27, 2019

@1315groop I’m sure, if you check the return value of get_project_settings(), that it will be empty.

get_project_settings() only works if the current working directory is a Scrapy project. You must either change the current working directory accordingly before calling get_project_settings() or pass the settings in a different way (e.g. a manually-defined dictionary of settings). See https://stackoverflow.com/q/31662797/939364

Read more comments on GitHub >

github_iconTop Results From Across the Web

Scrapy enabling item pipeline - Stack Overflow
How do I enable item pipeline if I define the ItemPipeline class in the same file as my spider. I tried the following...
Read more >
Item Pipeline — Scrapy 2.7.1 documentation
Each item pipeline component (sometimes referred as just “Item Pipeline”) is a Python class that implements a simple method.
Read more >
Scrapy Item Pipelines Not Enabling - ADocLib
Solving specific problems This object provides access to all Scrapy core components and it's the only The Extension Manager is responsible for loading...
Read more >
How to run Scrapy spiders in your Python program
The Crawler object provides access to all Scrapy core components, ... the same process and they will not start running until the start()...
Read more >
Common Practices — Scrapy 文档 - Read the Docs
import scrapy from scrapy.crawler import CrawlerProcess class MySpider(scrapy. ... but it won't start or interfere with existing reactors in any way.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found