Centralized Request fingerprints
See original GitHub issueIt is very easy to have a subtle bug when using a custom duplicates filter that changes how request fingerprint is calculated.
- Duplicate filter checks request fingerprint and makes Scheduler drop the request if it is a duplicate.
- Cache storage checks request fingerprint and fetches response from cache if it is a duplicate.
- If fingerprint algorithms differ we’re in trouble.
The problem is that there is no way to override request fingerprint globally; to make Scrapy always take something extra in account (an http header, a meta option) user must override duplicates filter and all cache storages that are in use.
Ideas about how to fix it:
- Use duplicates filter
request_fingerprint
method in cache storage if this method is available; - create a special Request.meta key that
request_fingerprint
function will take into account; - create a special Request.meta key that will allow to provide a pre-calculated fingerprint;
- add a settings.py option to override request fingerprint function globally.
Issue Analytics
- State:
- Created 9 years ago
- Comments:40 (32 by maintainers)
Top Results From Across the Web
Fingerprints | FINRA.org
Firms must submit fingerprints for individuals specified in Rule 17f-2 of the Securities and Exchange Act of 1934.
Read more >Department of Human Services | Central Fingerprint Unit
The Central Fingerprint Unit is responsible for the collection, review, interpretation and dissemination of all criminal history record information (CHRI).
Read more >Fingerprinting Services - Texas Department of Public Safety
The current methodology requiring submission of paper fingerprint cards, although effective, is centralized and may take several days to process ...
Read more >National Fingerprint Based Background Checks Steps for ... - FBI
The check must be fingerprint-based. · The check should be submitted through the state's central record repository and include a state criminal history...
Read more >Fingerprinting - Nebraska State Patrol
If those fingerprints are to be submitted to the FBI for a nationwide criminal ... Roadways are opening across the panhandle and north...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey! Currently I’m not sure we should be developing a rule engine for this. For example, instead of
I’d prefer to have something along these lines:
This way one can use any Python function, and instead of using a dict with string constants we use function arguments. This allows to
This exact API probably won’t work, as I’d like it to handle more cases - it’d be good to have a way to customize it not only in settings.py, but also in middlewares and in a spider as well, per-request. Anyways, you get the idea 😃
By the way, https://github.com/scrapy/scrapy/pull/3420 may be relevant, as we started to look at fingerprints and thinking about API.
@lenzai Fingerprints go beyond URLs.