Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to handle file exclusions and inclusions in 0.8?

See original GitHub issue

This is an issue that has cropped up repeatedly in various guises (e.g., #215, #364, #277, #184, #131, and probably others). The question is how to allow users to specify explicit inclusion and exclusion paths at BIDSLayout initialization. The reason for bringing this up again is that, as of version 0.8 (see #369), pybids will no longer depend on grabbit. The cord-cutting means we can no longer rely on the behavior implemented in grabbit. Since this was entirely undocumented in pybids, I think we have a good opportunity to start afresh and hopefully settle on something that works for everyone.

The main constraints I think we should try to respect are:

We want to exclude a bunch of hard-coded subdirectories by default (e.g., 'code', 'stimuli', 'sourcedata', etc.)
Users should be able to easily override any of the default exclusions and make sure they’re indexed
Users should be able to specify arbitrary directories anywhere in the file system that should not be indexed (in the event they’re encountered in any raw or derivatives BIDSLayout)

The current approach doesn’t allow users to specify explicit exclusions at all (well, it does, but this is an undocumented grabbit feature). It uses an include argument only as a means of negating the default exclusions. E.g., if you want 'stimuli'/ to be indexed, you pass include=['stimuli']. Beyond this, there’s no pybids-level ability to control inclusions or exclusions (aside from specifying derivatives, which is a separate matter that I think we’re handling in a satisfactory way). I don’t think this is satisfactory, and a bunch of the opened issues reflect that.

Here are a few proposals (feel free to suggest others):

Keep the current approach, where include negates values in the default exclusion list, but add an exclude argument that causes any matching files/dirs to be skipped during indexing. The main downside I see here is that the behavior is counterintuitive, as include and exclude act asymmetrically. A potential fix is to give these arguments different names (e.g., override_exclusions and exclude_paths).
Stick with just exclude, and have any manually specified value override the default internal list (e.g., if you pass ['code', 'sourcedata'], then things like 'stimuli' will now be indexed, and only files/dirs that match the elements in your list will be skipped). The downside of this is it requires users to know what the default exclusions are, and reproduce them, and this will probably get pretty messy.
Get rid of the current default exclusion list entirely, and treat exclude as a strict list of paths to exclude from indexing. Now that the validator is working properly, directories like ‘stimuli’ will automatically be skipped if validate=True, because files won’t pass the validator unless they’re explicitly part of the spec. The downside of this option is that it makes it difficult to index selectively—e.g., if you want to index only what’s in 'stimuli', you need to set validate=False and then pass a whole pile of exclusions (i.e., everything that doesn’t pass the validator except for 'stimuli').

I lean towards (1) (with more explicit argument names). Thoughts? If I don’t get any feedback in the next couple of days, I’ll make an executive decision in the interest of getting 0.8 merged, so speak up now if you have an opinion! (Tagging in @effigies @adelavega @yarikoptic @gkiar)

Issue Analytics

State:
Created 5 years ago
Comments:20 (3 by maintainers)

Top GitHub Comments

3reactions

effigiescommented, Feb 5, 2019

How about ignore and attend? 😛

1reaction

tyarkonicommented, Feb 6, 2019

Seems reasonable. I don’t want to deal with .gitignore inside pybids though if I can help it. It makes more sense to me to have the bids-validator deal with .bidsignore than to do file validation in the validator and pre-screening in pybids. The former seems cleaner and more maintainable. I propose we do this by having the BIDSValidator take the .gitignore information either at initialization, or in a .setup() call made prior to any attempt to validate individual files. Then the BIDSValidator can internally set up its own rules for screening files.

That would make things extremely simple on the pybids end, because, as you suggest, anything passed to ignore would literally be appended to the .gitignore list passed to the validator. So that would be handled in one step. It would also make it really easy to handle files in force_index, because the internal _validate_file call could first check if the file matches something in force_index, and if it does, it would never even make a call to the BIDSValidator.