How to handle file exclusions and inclusions in 0.8?
See original GitHub issueThis is an issue that has cropped up repeatedly in various guises (e.g., #215, #364, #277, #184, #131, and probably others). The question is how to allow users to specify explicit inclusion and exclusion paths at BIDSLayout
initialization. The reason for bringing this up again is that, as of version 0.8 (see #369), pybids will no longer depend on grabbit. The cord-cutting means we can no longer rely on the behavior implemented in grabbit. Since this was entirely undocumented in pybids, I think we have a good opportunity to start afresh and hopefully settle on something that works for everyone.
The main constraints I think we should try to respect are:
- We want to exclude a bunch of hard-coded subdirectories by default (e.g.,
'code'
,'stimuli'
,'sourcedata'
, etc.) - Users should be able to easily override any of the default exclusions and make sure they’re indexed
- Users should be able to specify arbitrary directories anywhere in the file system that should not be indexed (in the event they’re encountered in any raw or derivatives
BIDSLayout
)
The current approach doesn’t allow users to specify explicit exclusions at all (well, it does, but this is an undocumented grabbit feature). It uses an include
argument only as a means of negating the default exclusions. E.g., if you want 'stimuli'/
to be indexed, you pass include=['stimuli']
. Beyond this, there’s no pybids-level ability to control inclusions or exclusions (aside from specifying derivatives, which is a separate matter that I think we’re handling in a satisfactory way). I don’t think this is satisfactory, and a bunch of the opened issues reflect that.
Here are a few proposals (feel free to suggest others):
-
Keep the current approach, where
include
negates values in the default exclusion list, but add anexclude
argument that causes any matching files/dirs to be skipped during indexing. The main downside I see here is that the behavior is counterintuitive, asinclude
andexclude
act asymmetrically. A potential fix is to give these arguments different names (e.g.,override_exclusions
andexclude_paths
). -
Stick with just
exclude
, and have any manually specified value override the default internal list (e.g., if you pass['code', 'sourcedata']
, then things like'stimuli'
will now be indexed, and only files/dirs that match the elements in your list will be skipped). The downside of this is it requires users to know what the default exclusions are, and reproduce them, and this will probably get pretty messy. -
Get rid of the current default exclusion list entirely, and treat
exclude
as a strict list of paths to exclude from indexing. Now that the validator is working properly, directories like ‘stimuli’ will automatically be skipped ifvalidate=True
, because files won’t pass the validator unless they’re explicitly part of the spec. The downside of this option is that it makes it difficult to index selectively—e.g., if you want to index only what’s in'stimuli'
, you need to setvalidate=False
and then pass a whole pile of exclusions (i.e., everything that doesn’t pass the validator except for'stimuli'
).
I lean towards (1) (with more explicit argument names). Thoughts? If I don’t get any feedback in the next couple of days, I’ll make an executive decision in the interest of getting 0.8 merged, so speak up now if you have an opinion! (Tagging in @effigies @adelavega @yarikoptic @gkiar)
Issue Analytics
- State:
- Created 5 years ago
- Comments:20 (3 by maintainers)
How about
ignore
andattend
? 😛Seems reasonable. I don’t want to deal with
.gitignore
inside pybids though if I can help it. It makes more sense to me to have the bids-validator deal with.bidsignore
than to do file validation in the validator and pre-screening in pybids. The former seems cleaner and more maintainable. I propose we do this by having theBIDSValidator
take the.gitignore
information either at initialization, or in a.setup()
call made prior to any attempt to validate individual files. Then theBIDSValidator
can internally set up its own rules for screening files.That would make things extremely simple on the pybids end, because, as you suggest, anything passed to
ignore
would literally be appended to the.gitignore
list passed to the validator. So that would be handled in one step. It would also make it really easy to handle files inforce_index
, because the internal_validate_file
call could first check if the file matches something inforce_index
, and if it does, it would never even make a call to theBIDSValidator
.