Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve control over number of threads used in an mne call

See original GitHub issue

Description

I propose to use threadpoolctl to improve control over the number of threads used throughout mne and in calls to external libraries like numpy. This is apparently the direction that numpy has moved to as discussed here: https://github.com/numpy/numpy/issues/11826

Reasoning

I have had trouble completely controlling the number of threads used by various mne functions. Many mne functions have n_jobs arguments that control the number of threads used in that function, but there are cases where code within that function can escape this limit due to externally defined reference values. And then there are functions like mne.chpi.filter_chpi that do not have the n_jobs argument, but can still parallelize. It is possible to control this with environment variables as dicussed here, but that only works before you import the respective library, e.g. numpy. The easiest way to control thread limits after an import has happened appears to be threadpoolctl.

Proposed Implementation

I have successfully use the syntax

from threadpoolctl import threadpool_limits
with threadpool_limits(limits=n_jobs, user_api="blas"):
    mne.do_something()

to control threads used in an mne call. The same could be used internally to make better use of the existing n_jobs argument without forcing the user to do it themselves. If this proves successful, it might make sense to add the n_jobs argument in even more places.

Issue Analytics

State:
Created a year ago
Comments:15 (9 by maintainers)

Top GitHub Comments

1reaction

agramfortcommented, Apr 19, 2022

Regarding n_jobs=None: I like the idea, because it does not change the current default behaviour, but allows to hand over control to a lower-tier context. However, it should always be clear, what takes preference. Unless otherwise explained, I would expect that the call raw.filter(…, n_jobs=<a_number>) should overrule whatever an external context defines, unless the value is None as you defined above. Do I understand correctly, that setting n_jobs=-1 would still mean that all available cores are used?Message ID: @.***>

yes it’s the sklearn behavior.

0reactions

dafrosecommented, Apr 19, 2022

@larsoner thanks for the offer. I would love to, but I am afraid it would take some time. I already have a few PRs on my todo list, one of them already for mne and I haven’t gotten to do any of them yet… So if it can wait for a few weeks TM, maybe. But I won’t mind, if someone else did it until then.