question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to handle file exclusions and inclusions in 0.8?

See original GitHub issue

This is an issue that has cropped up repeatedly in various guises (e.g., #215, #364, #277, #184, #131, and probably others). The question is how to allow users to specify explicit inclusion and exclusion paths at BIDSLayout initialization. The reason for bringing this up again is that, as of version 0.8 (see #369), pybids will no longer depend on grabbit. The cord-cutting means we can no longer rely on the behavior implemented in grabbit. Since this was entirely undocumented in pybids, I think we have a good opportunity to start afresh and hopefully settle on something that works for everyone.

The main constraints I think we should try to respect are:

  • We want to exclude a bunch of hard-coded subdirectories by default (e.g., 'code', 'stimuli', 'sourcedata', etc.)
  • Users should be able to easily override any of the default exclusions and make sure they’re indexed
  • Users should be able to specify arbitrary directories anywhere in the file system that should not be indexed (in the event they’re encountered in any raw or derivatives BIDSLayout)

The current approach doesn’t allow users to specify explicit exclusions at all (well, it does, but this is an undocumented grabbit feature). It uses an include argument only as a means of negating the default exclusions. E.g., if you want 'stimuli'/ to be indexed, you pass include=['stimuli']. Beyond this, there’s no pybids-level ability to control inclusions or exclusions (aside from specifying derivatives, which is a separate matter that I think we’re handling in a satisfactory way). I don’t think this is satisfactory, and a bunch of the opened issues reflect that.

Here are a few proposals (feel free to suggest others):

  1. Keep the current approach, where include negates values in the default exclusion list, but add an exclude argument that causes any matching files/dirs to be skipped during indexing. The main downside I see here is that the behavior is counterintuitive, as include and exclude act asymmetrically. A potential fix is to give these arguments different names (e.g., override_exclusions and exclude_paths).

  2. Stick with just exclude, and have any manually specified value override the default internal list (e.g., if you pass ['code', 'sourcedata'], then things like 'stimuli' will now be indexed, and only files/dirs that match the elements in your list will be skipped). The downside of this is it requires users to know what the default exclusions are, and reproduce them, and this will probably get pretty messy.

  3. Get rid of the current default exclusion list entirely, and treat exclude as a strict list of paths to exclude from indexing. Now that the validator is working properly, directories like ‘stimuli’ will automatically be skipped if validate=True, because files won’t pass the validator unless they’re explicitly part of the spec. The downside of this option is that it makes it difficult to index selectively—e.g., if you want to index only what’s in 'stimuli', you need to set validate=False and then pass a whole pile of exclusions (i.e., everything that doesn’t pass the validator except for 'stimuli').

I lean towards (1) (with more explicit argument names). Thoughts? If I don’t get any feedback in the next couple of days, I’ll make an executive decision in the interest of getting 0.8 merged, so speak up now if you have an opinion! (Tagging in @effigies @adelavega @yarikoptic @gkiar)

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:20 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
effigiescommented, Feb 5, 2019

How about ignore and attend? 😛

1reaction
tyarkonicommented, Feb 6, 2019

Seems reasonable. I don’t want to deal with .gitignore inside pybids though if I can help it. It makes more sense to me to have the bids-validator deal with .bidsignore than to do file validation in the validator and pre-screening in pybids. The former seems cleaner and more maintainable. I propose we do this by having the BIDSValidator take the .gitignore information either at initialization, or in a .setup() call made prior to any attempt to validate individual files. Then the BIDSValidator can internally set up its own rules for screening files.

That would make things extremely simple on the pybids end, because, as you suggest, anything passed to ignore would literally be appended to the .gitignore list passed to the validator. So that would be handled in one step. It would also make it really easy to handle files in force_index, because the internal _validate_file call could first check if the file matches something in force_index, and if it does, it would never even make a call to the BIDSValidator.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Add an exclusion to Windows Security - Microsoft Support
Select Add an exclusion, and then select from files, folders, file types, or process. A folder exclusion will apply to all subfolders within...
Read more >
Exclusions from Jacoco Report - Baeldung
In this tutorial, we'll learn how to exclude certain classes and packages from JaCoCo test coverage reports.
Read more >
Configure Sonar to exclude files from Maven pom.xml
Just noting that <sonar.exclusions> will exclude matched files from all static analysis and code coverage. Also, the commas are required between ...
Read more >
Processing File Inclusions and Exclusions - OpenText
There are situations where it may be appropriate to target only common user file types and disregard all other files types. In this...
Read more >
Scientific Rationale for the Inclusion and Exclusion Criteria for ...
According to the label, the risks of alteplase therapy to treat acute ischemic stroke may be increased in patients with severe neurological ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found