Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CrawlerProcess doesn't load Item Pipeline component

See original GitHub issue

If I using scrapy crawl spider_name , everything is fun. BUT When I using CrawlerProcess to wrote my spider, I found CrawlerProcess doesn’t load Item Pipeline component !

Issue Analytics

State:
Created 7 years ago
Comments:14 (6 by maintainers)

Top GitHub Comments

33reactions

swizzardcommented, Apr 4, 2016

@zouge if you’re using CrawlerProcess outside the ‘normal’ command-line process, you have to load in your settings yourself:

from scrapy.crawler import CrawlerProcess
from scrapy.settings import Settings

from my_project import settings as my_settings

crawler_settings = Settings()
crawler_settings.setmodule(my_settings)
process = CrawlerProcess(settings=crawler_settings)

8reactions

Gallaeciocommented, Sep 27, 2019

@1315groop I’m sure, if you check the return value of get_project_settings(), that it will be empty.

get_project_settings() only works if the current working directory is a Scrapy project. You must either change the current working directory accordingly before calling get_project_settings() or pass the settings in a different way (e.g. a manually-defined dictionary of settings). See https://stackoverflow.com/q/31662797/939364

Top Results From Across the Web

Scrapy enabling item pipeline - Stack Overflow

How do I enable item pipeline if I define the ItemPipeline class in the same file as my spider. I tried the following...

Item Pipeline — Scrapy 2.7.1 documentation

Each item pipeline component (sometimes referred as just “Item Pipeline”) is a Python class that implements a simple method.

Scrapy Item Pipelines Not Enabling - ADocLib

Solving specific problems This object provides access to all Scrapy core components and it's the only The Extension Manager is responsible for loading...

How to run Scrapy spiders in your Python program

The Crawler object provides access to all Scrapy core components, ... the same process and they will not start running until the start()...

Common Practices — Scrapy 文档 - Read the Docs

import scrapy from scrapy.crawler import CrawlerProcess class MySpider(scrapy. ... but it won't start or interfere with existing reactors in any way.