question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parallelize blocking (Fingerprinter)

See original GitHub issue

AFAIK, Fingerprinter.__call__ is embarrassingly parallel: you just need to partition your records by the number of CPUs you have, call Fingerprinter.__call__ for each partition, then reduce write results to single blocking_map table.

Currently that’s left for the implementer to do. Isn’t that something the library could do, considering it already has a num_cores parameter? I could help with this.

I’ve found https://github.com/dedupeio/dedupe/issues/305 but it’s too old. That issue mentions message passing costs. But for DB-based “big dedupe” applications, that’s not an issue since data isn’t at main memory. Each worker process can read its own partition of data from the DB.

Even if we decide the library won’t do that by default, maybe we should update “big dedupe” DB-based examples like pgsql_big_dedupe_example.py to parallelize blocking?

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:13 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
fgreggcommented, Sep 15, 2020

anyway, as a next step, you plan makes sense, Flávio

1reaction
fgreggcommented, Sep 15, 2020

#856 would be a good way around that.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Examination of Speed Contribution of Parallelization for ... - Gale
Parallel processing and parallel CPU computing can be considered as distribution of processes ... Again, the fingerprint is divided into wx w sized...
Read more >
(PDF) Parallel processing for Fingerprint feature extraction
We explore the possibility to optimize execution time of fingerprint minutiae-based feature extraction using parallel process. Hypothetically, ...
Read more >
Experiments in parallel fingerprint matching - GovInfo
4 Experiments in scalable parallel distributed fingerprint matching at NIST . ... within a host and avoids multiple transfers of the same data...
Read more >
Experiments in Parallel Fingerprint Matching
This report demonstrated the feasibility of a parallel fingerprint matching hosted in the NIST Data Flow System Version II middleware layer, and ...
Read more >
Parallel Space || How To Set Password Pattern & Fingerprint ...
Parallel Space || How To Set Password Pattern & Fingerprint Lock. Watch later. Share. Copy link. Info. Shopping. Tap to unmute.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found