2GB .fif limit
See original GitHub issueHi everyone,
Mid-term MNE-Python user here. I have never run into saving FIF files before, because if I have a dataset with around 6GB worth of data, whenever I save my epochs, MNE-Python usually parcellates them into many < 2GB files and appends “-1”, “-2” onto the filenames. Then, all I have to do is load the main file and it automatically detects that a split has happened and iteratively loads all these < 2GB divisions of the FIF data and puts it all back together again into the varriable assigned to mne.read_epochs
.
Now as I am dealing with more and more data, I notice a few edge cases. It seems sometimes that I get the error telling me I can’t save a file due to this < 2GB restriction, when this is much smaller than another file that has saved successfully.
My analysis pipeline requires larger subsets and smaller subsets taken from the same data. The large data saves fine (MNE-Python splits the files equally) but I am noticing now that in many of the cases where I extract subsets, the splitting fails and it tries to create > 2GB files in my current directory and then calls the error.
I am not sure how it works but it looks like there is a calculation of sorts that determines what to do when the data is large. The subsets could have easily been split up multiple < 2GB files, but this isn’t happening if the data is, say, slightly over the limit.
I can describe the saving process as having three main cases:
- Data is < 2GB,
epochs.save
works fine - Data is >>>> 2GB (way over),
epochs.save
splits the files and successfully saves - Data is “slightly” over 2GB,
epochs.save
doesn’t split, calls exception
Does anyone have any insight into the exact condition checking and can see if there is a bug somewhere? It’s awkward to keep splitting the data for saving and concatenating them back together when I need them. Those processes take a fair bit of time and ideally I would like not to have to split them arbitrarily when they’re slightly over the limit.
If it’s not a bug, then that’s fine. I am just interested in knowing why it works that way. So, if anyone can shed some light, it’s be very much appreciated.
I was trying to think of a code snippet to demonstrate the example, but that’s problematic given it’s a large-data problem and not something easily demonstrated with a nice, clean, reproducible example (sorry!)
Best Alex
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (7 by maintainers)
Top GitHub Comments
I’ve now backported #7740 to maint/0.20, @agramfort worth another quick release for this?
Should be fixed in master by #7740. Can you try it on one of your problematic cases? We didn’t backport to 0.20 but maybe we should.