Spacy convert not writing files larger than 2 GB.
See original GitHub issueCopy some conllu file over and over into some file. Then run “spacy convert” it to json. You’ll see output file size is about 2147479553 or so.
How to reproduce the behaviour
Your Environment
- Operating System: Linux Ubuntu 18.04
- Python Version Used: 3.7 and 3.8
- spaCy Version Used: 2.3.2
- Environment Information:
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (7 by maintainers)
Top Results From Across the Web
Fix 'File Is Too Large for Destination File System ... - EaseUS
Get 'The file is too large for the destination file system' error message while copying larger files with size more than 4GB to...
Read more >File is too Large to Copy to External Hard Drive (4 Ways)
Well, one possible reason is that the file is really larger than the available space of the external hard drive.
Read more >Why can't I copy large files over 4GB to my USB flash drive or ...
Learn about possible reasons for the problem why it may not be possible to copy large files over 4GB to USB flash drive...
Read more >How to fix: Memory stick says "File too large" - YouTube
Do you get the error message " File too large " when you try to copy files to your USB memory stick, even...
Read more >Quickly create large file on a Windows system - Stack Overflow
I'm confused here... if it is instantaneous, and it's not a sparse file, then how does it actually use up disk space? –...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Not sure how it truncates the file but it appears to work correctly. 650mb conllu -> 2gb json, 1300mb conllu -> 2.1gb json, 2600mb conllu -> 2.1gb json, and it trains correctly.
(I was checking how the resulting quality changes when you change the dataset size)
Thanks, I’ll move to the list of smaller files, but this issue should be at least reported and a error should appear when it happens.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.