question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Target suffix loading: use of `contains` may pick multiple targets

See original GitHub issue

Issue description

Current behavior

The current implementation of target suffix loading involves the use of the contains() string method to capture target files as shown in the below lines:

https://github.com/ivadomed/ivadomed/blob/76b36a0a0f7141feb2d5b00b33e4c3a06865fc2c/ivadomed/loader/bids_dataframe.py#L125-L128

This means that

  • if your derivatives contains multiple filenames with overlap (e.g. _lesion-manual.nii.gz and _lesion-manual2.nii.gz) and
  • if you use a "target_suffix" containing the overlap (e.g. _lesion-manual),

then the current loading process picks all of the filenames with the given overlap as potential targets.

This is problematic because the user might (reasonably) think that only _lesion-manual.nii.gz will be used as the ground-truth when the "target_suffix" is specified as _lesion-manual, whereas in reality both _lesion-manual.nii.gz and _lesion-manual2.nii.gz are used.

I didn’t go deeper in the codebase yet to see what happens when multiple targets are picked during the loading process, but I can confirm that the GT that ends up being used changes from run to run based on my experiments with ivadomed --test.

Expected behavior

I would expect the "target_suffix" to exactly match the target filename. Two options we have are:

  1. Changing line 128 as shown above with something like

    & df_next['filename'].str.split(os.extsep).apply(lambda x: x[0]).str.endswith('|'.join(self.target_suffix)))]
    

    which enforces an exact match of target suffix.

  2. In case the use of contains() is justified in some scenarios and its implemented that way for a specific reason, the user can instead give a full "target_suffix" including the file extension such as _lesion-manual.nii.gz. I have tested this and can confirm it solves the problem.

The latter option is notably a lot easier to implement as it doesn’t require a change in the codebase. However, I think this is an issue many people might face without knowing it in the future.

Steps to reproduce

  1. Download the basel-mp2rage dataset as shown here.
  2. Run preprocessing on the data as shown here.
  3. Insert print(bids_df.get_deriv_fnames()) after line 370 in ivadomed/main.py which initializes the BIDS data frame.
  4. Run the following config file with ivadomed --train.
  5. Take a look at the output of the print statement from step 3.

However, this is specific to one dataset and it is understandable that not everybody will be willing to run preprocessing etc. on this particular dataset. In this case, I would like to point out that this issue is reproducible with any dataset which has multiple annotations per subject where the string target suffix for these annotations have an overlap.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
uzaymacarcommented, Mar 10, 2022

Feel free to take it on @mariehbourget! I only created a branch so it’s OK.

1reaction
mariehbourgetcommented, Mar 10, 2022

While debugging for another issue #1096 from an external user, I came across this exact problem. In this case, we have the following target_suffix (coming from the segmentation in ADS): _seg-axon _seg-axonmyelin _seg-myelin With _seg-axon in the config file, both _seg-axon and _seg-axonmyelin are picked up in the indexation and one or the other is used for training which is not good at all.

I’ll inform the user for now and tag this issue as high priority.

Read more comments on GitHub >

github_iconTop Results From Across the Web

GNU make
Force Targets, You can use a target without a recipe or ... A recipe may have more than one command, either on the...
Read more >
Tutorial on writing makefiles
Tutorial on writing makefiles. Do I need a makefile? A simple makefile. Using variables; Pattern rules. Phony targets; Working with several directories.
Read more >
Build settings reference | Apple Developer Documentation
A detailed list of individual Xcode build settings that control or change the way a target is built.
Read more >
describe-target-groups — AWS CLI 1.27.37 Command ...
describe-target-groups is a paginated operation. Multiple API calls may be issued in order to retrieve the entire data set of results.
Read more >
Targets - Parcel
Parcel can compile your source code in multiple different ways simultaneously. These are called targets. For example, you could have a “modern” target...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found