Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Analysing the performance of different methods to get windows

See original GitHub issue

I’ve started looking at the performance of various ways of getting windows: 1- MNE: epochs.get_data(ind)[0] with lazy loading (preload=False) 2- MNE: epochs.get_data(ind)[0] with eager loading (preload=True) 3- MNE: direct access to the internal numpy array with epochs._data[index] (requires eager loading) 4- HDF5: using h5py (lazy loading)

The script that I used to run the comparison is here: https://github.com/hubertjb/braindecode/blob/profiling-mne-epochs/test/others/profiling_mne_epochs.py Also, I ran the comparison on a single CPU using: >>> taskset -c 0 python profiling_mne_epochs.py

Here’s the resulting figure, where the x-axis is the number of time samples in the continuous recording: timing_results

For the moment, it looks like: 1- ._data[index] is unsurprisingly the fastest, however it requires to load the entire data into memory. 2- hdf5 is very close, with around 0.5 ms per loop, which is great knowing it’s able to only load one window at a time. 3- get_data(index) is much slower, but this is expected as we know it creates a new mne.Epochs object every time it’s called. Also, the gap between preload=True and preload=False is about 1.5 ms, which might be OK. The main issue though seems to be the linear increase of execution time as the continuous data gets bigger and bigger.

Next steps

Considering the benefits of using MNE for handling the EEG data inside the Dataset classes, I think it would be important to dive deeper into the inner workings of get_data() to see whether simple changes could make this more efficient. I can do some actual profiling on that. What do you think @agramfort @robintibor @gemeinl ?

Note: I haven’t included the extraction of labels in this test.

Issue Analytics

State:
Created 4 years ago
Comments:25 (7 by maintainers)

Top GitHub Comments

1reaction

robintiborcommented, Feb 10, 2020

Great @hubertjb . Seems we are getting to a reasonable training time range. Would also be interesting how big the difference is for Deep4. And as you said, maybe num_workers would already close the gap enough to consider it finished. I would say a gap of 1.5x for deep4 to me is acceptable.

1reaction

robintiborcommented, Jan 30, 2020

Cool, thanks for the clear info! Yes, diving a bit deeper may be helpful. Keep in mind: we will need fast access mainly during the training loop, so directly before returning some tensor/ndarray (in the usual case) that will be passed to the deep network. So for preload=True, accessing _data may be fine to me. The question is more the preload=False case, if this one can be fast enough in mne as well. So the relatively small gap for get_data there is encouraging for sure.

You could additionally do the following on reasonable GPU to know better what kind of times we may need to reach in the end: Forward one dummy batch size (64,22,1000) through the deep and shallow network, compute classification loss with dummy targets, and do the backward, measure the wall clock time (don’t use profilers here for now, they may not work well with GPU). Then we have a rough time we want to reach…