Allow multiple items through pipelines?
See original GitHub issueThe documentation for pipeline specifies that process_item
must either return a dict with data, Item
object or raise a DropItem
exception. Is there a reason why we aren’t allowed to return an iterable of dictionaries with data (or Item
objects)? It seems impossible to write a pipeline that modifies the input item and returns multiple items under the current framework.
Thank you!
Issue Analytics
- State:
- Created 7 years ago
- Reactions:3
- Comments:24 (10 by maintainers)
Top Results From Across the Web
Scrapy, Python: Multiple Item Classes in one pipeline?
You can have one pipeline handle only one type of item request, though, if handling that item type is unique, by checking the...
Read more >Check out multiple repositories in your pipeline - Microsoft Learn
Pipelines often rely on multiple repositories that contain source, tools, scripts, or other items that you need to build your code.
Read more >Item Pipeline — Scrapy 2.7.1 documentation
Write items to a JSON lines file¶. The following pipeline stores all scraped items (from all spiders) into a single items.jsonl file, containing ......
Read more >Downstream pipelines - GitLab Docs
A pipeline in one project can trigger downstream pipelines in another project, called multi-project pipelines. The user triggering the upstream pipeline must be ......
Read more >Item Pipeline - Scrapy documentation - Read the Docs
After an item has been scraped by a spider, it is sent to the Item Pipeline which process it through several components that...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
For me it makes more sense for pipeline to accept a single item, but return either a single item or a list/iterable. Before:
New feature:
Hi all,
I ended up defining a custom post-processing step after ItemPipelines in the following manner:
Each processor (analogous to pipeline) defines a function called
process_iter_items
, which takes in an iterable of dicts, and must return an iterable of dicts. The set of processors is managed byBatchProcessorManager
, a MiddlewareManager class similar toItemPipelineManager
, which supports the chaining ofprocess_iter_items
functions instead ofprocess_item
functions.The chain of
process_iter_items
is connected to the signal emitted by the last ItemPipeline that stores the items in MongoDB.