question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`ItemLoader` fields initialized from `item` are reprocessed

See original GitHub issue

Description

#3804 introduced a bug where ItemLoader fields are reprocessed. Related #3897.

Steps to Reproduce

from pprint import pprint

from scrapy import Field, Item
from scrapy.loader import ItemLoader
from scrapy.loader.processors import TakeFirst


class X(Item):
    x = Field(output_processor=TakeFirst())


loader = ItemLoader(X())
loader.add_value("x", ["value1", "value2"])
x = loader.load_item()
pprint(x)
# {'x': 'value1'}

pprint(ItemLoader(x).load_item())
# {'x': 'v'}

Expected behavior: ItemLoader initialized from the x item does not reprocess its fields and loads {'x': 'value1'}.

Actual behavior: ItemLoader initialized from the x item reprocesses its fields and loads {'x': 'v'}.

Versions

Scrapy       : 1.7.3
lxml         : 4.4.1.0
libxml2      : 2.9.9
cssselect    : 1.1.0
parsel       : 1.5.2
w3lib        : 1.21.0
Twisted      : 19.7.0
Python       : 3.6.5 (default, May  3 2018, 10:08:28) - [GCC 5.4.0 20160609]
pyOpenSSL    : 19.0.0 (OpenSSL 1.1.1c  28 May 2019)
cryptography : 2.7
Platform     : Linux-4.4.0-127-generic-x86_64-with-LinuxMint-18.1-serena

Additional context

Here’s the behavior of the previous version:

Scrapy       : 1.6.0
lxml         : 4.4.0.0
libxml2      : 2.9.9
cssselect    : 1.0.3
parsel       : 1.5.1
w3lib        : 1.20.0
Twisted      : 19.7.0
Python       : 3.6.5 (default, May  3 2018, 10:08:28) - [GCC 5.4.0 20160609]
pyOpenSSL    : 19.0.0 (OpenSSL 1.1.1c  28 May 2019)
cryptography : 2.7
Platform     : Linux-4.4.0-127-generic-x86_64-with-LinuxMint-18.1-serena
# {'x': 'value1'}
# {'x': 'value1'}

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:3
  • Comments:9 (7 by maintainers)

github_iconTop GitHub Comments

4reactions
sortafreelcommented, Aug 27, 2019

Checked, fixable, will ship new pull request with new tests this week.

4reactions
sortafreelcommented, Aug 26, 2019

Yeah, it was mostly because of failing test. Just came back from my vacation, will take a fresh look.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Item Loaders — Scrapy 2.7.1 documentation
An Item Loader contains one input processor and one output processor for each (item) field. The input processor processes the extracted data as ......
Read more >
PeopleSoft FSCM 9.2: Managing Items - Oracle Help Center
Item definitions with a status of Under Initialization, Pending Approval, or Denied Approval cannot be used in a transaction within the PeopleSoft system....
Read more >
Scrapy: ItemLoader, can someone explain to me this error?
I wanted to use Item Loader to yield scraped items, but I get error, I do not understand what is the problem. Can...
Read more >
SAGE2: server.js
src/node-itemloader'); // handles sage item creation; var Omicron = require('. ... + config.port) + "/";; }; // Initialize sage2 item lists ...
Read more >
Java Examples for net.minecraftforge.fml.common.registry ...
getItem(); if (item instanceof ItemBlock && Block. ... EventHandler public void preinit(FMLPreInitializationEvent init) { final Logger modLog = init.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found