Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parallelize blocking (Fingerprinter)

See original GitHub issue

AFAIK, Fingerprinter.__call__ is embarrassingly parallel: you just need to partition your records by the number of CPUs you have, call Fingerprinter.__call__ for each partition, then reduce write results to single blocking_map table.

Currently that’s left for the implementer to do. Isn’t that something the library could do, considering it already has a num_cores parameter? I could help with this.

I’ve found https://github.com/dedupeio/dedupe/issues/305 but it’s too old. That issue mentions message passing costs. But for DB-based “big dedupe” applications, that’s not an issue since data isn’t at main memory. Each worker process can read its own partition of data from the DB.

Even if we decide the library won’t do that by default, maybe we should update “big dedupe” DB-based examples like pgsql_big_dedupe_example.py to parallelize blocking?

Issue Analytics

State:
Created 3 years ago
Comments:13 (12 by maintainers)

Top GitHub Comments

1reaction

fgreggcommented, Sep 15, 2020

anyway, as a next step, you plan makes sense, Flávio

1reaction

fgreggcommented, Sep 15, 2020

#856 would be a good way around that.

Top Results From Across the Web

Examination of Speed Contribution of Parallelization for ... - Gale

Parallel processing and parallel CPU computing can be considered as distribution of processes ... Again, the fingerprint is divided into wx w sized...

(PDF) Parallel processing for Fingerprint feature extraction

We explore the possibility to optimize execution time of fingerprint minutiae-based feature extraction using parallel process. Hypothetically, ...

Experiments in parallel fingerprint matching - GovInfo

4 Experiments in scalable parallel distributed fingerprint matching at NIST . ... within a host and avoids multiple transfers of the same data...

Experiments in Parallel Fingerprint Matching

This report demonstrated the feasibility of a parallel fingerprint matching hosted in the NIST Data Flow System Version II middleware layer, and ...

Parallel Space || How To Set Password Pattern & Fingerprint ...

Parallel Space || How To Set Password Pattern & Fingerprint Lock. Watch later. Share. Copy link. Info. Shopping. Tap to unmute.