duplicates when parsing - max workerNum ?
See original GitHub issueI’m having duplicate objects in m parsing results.
I’m parsing large CSV files (using fromFile
and the on('json')
event) using workers.
When i set workerNum:4
everything seems to be ok (as far as i can tell), but if i use more i get duplicates.
For example: when i use workerNum:8
i get double objects in my parsed result.
When i use workerNum:12
i get triple duplicates.
any idea why? is there a limit of 4 workers?
Note: my machine has 48 vCPU
UPDATE: seems like 4 workers also produced duplicates
Issue Analytics
- State:
- Created 6 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
Parsing XML in SQL Server with duplicate tags - Stack Overflow
Omitting your XML blob, take a look at this: WITH XMLNAMESPACES (DEFAULT 'http://www.irs.gov/efile') SELECT c.query('.
Read more >Solved: Removing duplicates based on max of another column
Solved: Hi, I have a table that looks like this SortID ID Name City 1 1 xxx NYC 2 1 xxx Seattle 3...
Read more >How to find the row that has maximum number of duplicates in ...
To find the row that has maximum number of duplicates in an R matrix, we can follow the below steps −. First of...
Read more >MarkDuplicates (Picard) - GATK - Broad Institute
Identifies duplicate reads. This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating...
Read more >How To: Identify duplicate field values in ArcGIS 10.x
Select the Python parser. Field Calculator with Python parser button. Ensure that the Show Codeblock option is checked. Paste the following code ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi, Please checkout
Asynchronously Process
section here https://github.com/Keyang/node-csvtojson#asynchronouse-result-processFor your situation you can do:
I have temporarily removed support of
workerNum
in v2 asworkers
created too much overhead on inter process communication. Thebackground
is the at the same situation. If you are building a node.js based web service, it would probably be better to use built-incluster
feature to utilise multiple cores.I will add in another way to support multiple cpu cores for parsing
fromFile
in future as seems divide file into multiple chunk is the only proper way to parse in parallel.~Keyang
Great, thank you! Can’t wait for parallel parsing, not sure i would go for
cluster
right now, but i’ll definitely look into it!