question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow multiple items through pipelines?

See original GitHub issue

The documentation for pipeline specifies that process_item must either return a dict with data, Item object or raise a DropItem exception. Is there a reason why we aren’t allowed to return an iterable of dictionaries with data (or Item objects)? It seems impossible to write a pipeline that modifies the input item and returns multiple items under the current framework.

Thank you!

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:3
  • Comments:24 (10 by maintainers)

github_iconTop GitHub Comments

7reactions
kmikecommented, May 16, 2016
  1. A pipeline has to be able to return either a single object, or an iterable collection of it. It also has to accept both, otherwise it wouldn’t make sense.

For me it makes more sense for pipeline to accept a single item, but return either a single item or a list/iterable. Before:

item1 --> [pipeline A] --> item1 --> [pipeline B] --> ...
item2 --> [pipeline A] --x raise DropItem()

New feature:

item3 --> [pipeline A],--> item4 --> [pipeline B] --> ...
                      '--> item5 --> [pipeline B] --> ...
3reactions
dxue2012commented, May 10, 2016

Hi all,

I ended up defining a custom post-processing step after ItemPipelines in the following manner:

  1. Store all the items in MongoDB at the end of Pipelines
  2. Read all items from MongoDB, and do more stuff on the data with “Processors”

Each processor (analogous to pipeline) defines a function called process_iter_items, which takes in an iterable of dicts, and must return an iterable of dicts. The set of processors is managed by BatchProcessorManager, a MiddlewareManager class similar to ItemPipelineManager, which supports the chaining of process_iter_items functions instead of process_item functions.

The chain of process_iter_items is connected to the signal emitted by the last ItemPipeline that stores the items in MongoDB.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Scrapy, Python: Multiple Item Classes in one pipeline?
You can have one pipeline handle only one type of item request, though, if handling that item type is unique, by checking the...
Read more >
Check out multiple repositories in your pipeline - Microsoft Learn
Pipelines often rely on multiple repositories that contain source, tools, scripts, or other items that you need to build your code.
Read more >
Item Pipeline — Scrapy 2.7.1 documentation
Write items to a JSON lines file¶. The following pipeline stores all scraped items (from all spiders) into a single items.jsonl file, containing ......
Read more >
Downstream pipelines - GitLab Docs
A pipeline in one project can trigger downstream pipelines in another project, called multi-project pipelines. The user triggering the upstream pipeline must be ......
Read more >
Item Pipeline - Scrapy documentation - Read the Docs
After an item has been scraped by a spider, it is sent to the Item Pipeline which process it through several components that...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found