workspace.add_file without incremental refcounts
See original GitHub issueI don’t know if this is intentional: When looping over large workspaces and adding files to them, the workspace.mets._file_by_id
dict keeps growing and thus leaks all the new OcrdFile
and METS file element etree references. This can create extreme memory overhead and slow down eventually (because each new file has to be compared to ever more existing IDs).
Perhaps one can at least try to sever the references to the file etrees?
But an opt-out to the general _file_by_id
dict mechanism is probably best seen as part of #416, right?
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
5.2 Using tar to Perform Incremental Dumps - GNU.org
Incremental backup is a special form of GNU tar archive that stores additional metadata so that exact ... If this file does not...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I noticed that as well when removing the trivial caching. On my TODO list.
BTW This is a common cause of inefficiency. Not just when growing workspaces by adding more and more files. Any processor on a large workspace has to pay the penalty before it can start.