parser.apply does not return for a long time even though the progress bar indicates it finishes parsing
See original GitHub issueDescription of the bug
This is not a bug, but a performance issue. This is not noticeable when parsing a small number of documents, but parser.apply does not return even though the progress bar indicates it finishes parsing a long time ago (1 hour or more ago).
To Reproduce
Steps to reproduce the behavior:
- Parse many documents (my case: ~2500)
Expected behavior
parser.apply returns when the progress bar indicates it finished parsing all the documents.
Error Logs/Screenshots
If applicable, add error logs or screenshots to help explain your problem.
Environment (please complete the following information)
- OS: Debian Buster
- PostgreSQL Version: 12.1
- Poppler Utils Version: N/A
- Fonduer Version: 0.8.3+dev (01e0d9319b523aff7aa7f5c583a9f330b0705ecc)
Additional context
Add any other context about the problem here.
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (10 by maintainers)
Top Results From Across the Web
Setting up a Papa Parse progress bar with Web workers
The progress bar is updated, but only after the CSV file is parsed and the site is set up with data, so the...
Read more >Stupid Smartbook Connect Orientation Assignment - Quizlet
The progress bar indicates how many concepts you have completed, how many are in progress, and how many are still left in the...
Read more >11 Data import - R for Data Science - Hadley Wickham
11.1 Introduction Working with data provided by R packages is a great way to ... Long running jobs have a progress bar, so...
Read more >API — Click Documentation (7.x)
While iteration happens, this function will print a rendered progress bar to the given file (defaults to stdout) and will attempt to calculate...
Read more >AsyncTask - Android Developers
do not interfere with an in-progress onProgressUpdate(Progress...) call. ... even if cancel returns false, but onPostExecute(Result) has not yet run.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I’ve tested
masteron ~200 docs and can confirm that these changes fix OOM errors and slow performance. Many thanks for the fix.Your description sounds correct to me, and this is definitely a real bottleneck several have run into. We would love to try and resolve this, but I suspect it wouldn’t be a quick fix.
I’m going to reopen this issue.