Analysing the performance of different methods to get windowsSee original GitHub issue
I’ve started looking at the performance of various ways of getting windows:
epochs.get_data(ind) with lazy loading (
epochs.get_data(ind) with eager loading (
3- MNE: direct access to the internal numpy array with
epochs._data[index] (requires eager loading)
4- HDF5: using h5py (lazy loading)
The script that I used to run the comparison is here:
Also, I ran the comparison on a single CPU using:
>>> taskset -c 0 python profiling_mne_epochs.py
Here’s the resulting figure, where the x-axis is the number of time samples in the continuous recording:
For the moment, it looks like:
._data[index] is unsurprisingly the fastest, however it requires to load the entire data into memory.
2- hdf5 is very close, with around 0.5 ms per loop, which is great knowing it’s able to only load one window at a time.
get_data(index) is much slower, but this is expected as we know it creates a new
mne.Epochs object every time it’s called. Also, the gap between
preload=False is about 1.5 ms, which might be OK. The main issue though seems to be the linear increase of execution time as the continuous data gets bigger and bigger.
Considering the benefits of using MNE for handling the EEG data inside the Dataset classes, I think it would be important to dive deeper into the inner workings of
get_data() to see whether simple changes could make this more efficient. I can do some actual profiling on that. What do you think @agramfort @robintibor @gemeinl ?
Note: I haven’t included the extraction of labels in this test.
- Created 4 years ago
- Comments:25 (7 by maintainers)
Top GitHub Comments
Great @hubertjb . Seems we are getting to a reasonable training time range. Would also be interesting how big the difference is for Deep4. And as you said, maybe num_workers would already close the gap enough to consider it finished. I would say a gap of 1.5x for deep4 to me is acceptable.
Cool, thanks for the clear info! Yes, diving a bit deeper may be helpful. Keep in mind: we will need fast access mainly during the training loop, so directly before returning some tensor/ndarray (in the usual case) that will be passed to the deep network. So for
preload=True, accessing _data may be fine to me. The question is more the
preload=False case, if this one can be fast enough in mne as well. So the relatively small gap for
get_data there is encouraging for sure.
You could additionally do the following on reasonable GPU to know better what kind of times we may need to reach in the end: Forward one dummy batch size (64,22,1000) through the deep and shallow network, compute classification loss with dummy targets, and do the backward, measure the wall clock time (don’t use profilers here for now, they may not work well with GPU). Then we have a rough time we want to reach…