memory issues when getting tsv in derivatives
See original GitHub issueCan’t see this issue anywhere else.
I can use BIDSLaout.get()
to get the nifti files, and it does it in seconds. But if I try and get the confound tsv files instead, the python session will get killed cause it uses over 16GB of RAM to do it.
layout.get(scope=‘fMRIPrep’, desc=‘confounds’, suffix=‘regressors’, extension=‘tsv’, return_type=‘file’)
Getting one subject will work use around 10GB of RAM (and getting per subject does not feel ideal).
The following will work:
layout.get(scope=‘derivatives’, desc=‘preproc’, extension=‘nii.gz’)
So the problem is only when getting tsv files. It doesn’t matter if I set return_type to something else (I put return_type=‘file’ in the example because its not just a problem when returning BIDSDataFile). There are not more confounds tsv files than preproc nifti files either.
Is this a known/common problem or could there be a specific reason why I am hitting it? It feels counter-intuitive that getting the tsvs take up so much RAM. Is there a way around it which isn’t just loop over subjects?
Information about dataset
Not sure if any of these details are relevant. But might be helpful.
I have a BIDS directory with ca 250 runs in total. There are about 27 subjects, 2 sessions and ca 5-9 runs for one session and 1-4 runs for another session. len(layout.files) is 8240. The derivatives are in fMRIPrep. There are some other directories in the derivatives, but nothing that is added.
Full code of what I am doing below:
import bids
bids_dir = './' # pwd is in BIDS dataset
fmriprep_dir = 'derivatives/fmriprep-1.5.1/fmriprep/'
layout = bids.BIDSLayout(bids_dir)
layout.add_derivatives(bids_dir + fmriprep_dir)
# This will work
layout.get(scope='fMRIPrep', desc='preproc', extension='nii.gz')
# This will crash the session and use 100% memory on laptop with 16GB RAM
layout.get(scope='fMRIPrep', desc='confounds', suffix='regressors', extension='tsv', return_type='file')
Bids version: 0.9.4
Issue Analytics
- State:
- Created 4 years ago
- Comments:12
I’m happy to report that I have no idea how indexes work, and have made no progress on improving performance.
I also note that we don’t currently have indexes (!). That might be a sensible place to start…