Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to train using ivadomed because of a UnicodeDecodeError

See original GitHub issue

Issue description

Unable to train a 2D segmentation model due to a UnicodeDecodeError occuring outside the scope of ivadomed.

Current behavior

I am trying to traing a model on a BIDSified dataset but am encountering a decoding error, in particular - UnicodeDecodeError 'utf-8' codec can't decode byte 0xc4 in position 1430: invalid continuation byte. This is due to some non-ascii character in one of the json files, which is being flagged as an error while the indexing is done. I ran the debugger in order to pinpoint the source but since this due to some decoding error, the debugger’s final reach is until this line in loader/bids_dataframe.py. This initially suggested that some parameters of BIDSLayout can be tweaked to avoid this error.

Things already tried

@mariehbourget suggested to consider looking at this independently without involving ivadomed by loading the files using the json package directly. I wrote a script to do this, and I found that there are 74 json files which are flagged as erroneous. However, each json file has ~1800 lines and I do not see any way to look through all those lines to find some single non-ascii character.

@jcohenadad Our hunch about using validate=False (which is by default True) in the pybids.BIDSLayout’s function call (see the line’s link above) did not work as I got the same error. I also added validate=False in this line for the indexer, but still no luck.

Since I will only be using the .nii.gz files, I thought that this error could be prevented if we don’t index any metadata in the first place. Therefore, I tried adding the argument index_metadata=False (which was set to True by default in this line for the BIDS layout indexer class. Now I am getting this error - TypeError cannot unpack non-iterable NoneType object (possibly because “metadata” is now None and it cannot be returned). Therefore, it seems that indexing the metadata is crucial for other ivadomed functionalities, and I cannot simply not index the metadata without breaking other function calls.

I am wondering whether there is any way to not index the metadata (json) files and load simply the .nii.gz files for training? Any suggestions are highly appreciated! (tagging @charleygros @andreanne-lemay in case they have experienced anything like this?)

Full error trace of the UnicodeDecodeError

Traceback (most recent call last):
  File "ivadomed/ivadomed/main.py", line 574, in <module>
    run_main()
  File "ivadomed/ivadomed/main.py", line 567, in run_main
    run_command(context=context,
  File "ivadomed/ivadomed/main.py", line 352, in run_command
    bids_df = BidsDataframe(loader_params, path_output, derivatives=True)
  File "/home/nagakarthik/deepLearning/ivadomed_experiments/ivadomed/ivadomed/loader/bids_dataframe.py", line 65, in __init__
    self.create_bids_dataframe()
  File "/home/nagakarthik/deepLearning/ivadomed_experiments/ivadomed/ivadomed/loader/bids_dataframe.py", line 115, in create_bids_dataframe
    layout = pybids.BIDSLayout(str(path_data), config=self.bids_config, indexer=indexer,
  File "/home/nagakarthik/deepLearning/venvDL/lib/python3.8/site-packages/bids/layout/layout.py", line 156, in __init__
    indexer(self)
  File "/home/nagakarthik/deepLearning/venvDL/lib/python3.8/site-packages/bids/layout/index.py", line 112, in __call__
    self._index_metadata()
  File "/home/nagakarthik/deepLearning/venvDL/lib/python3.8/site-packages/bids/layout/index.py", line 380, in _index_metadata
    file_md.update(pl())
  File "/home/nagakarthik/deepLearning/venvDL/lib/python3.8/site-packages/bids/layout/index.py", line 275, in load_json
    return json.load(handle)
  File "/usr/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 1430: invalid continuation byte

Error trace when index_metadata=False

  File "/home/nagakarthik/deepLearning/ivadomed_experiments/ivadomed/ivadomed/loader/bids_dataset.py", line 103, in __init__
    df_sub, roi_filename, target_filename, metadata = self.create_filename_pair(multichannel_subjects, subject,
  File "/home/nagakarthik/deepLearning/ivadomed_experiments/ivadomed/ivadomed/loader/loader.py", line 91, in load_dataset
    dataset = BidsDataset(bids_df=bids_df,
  File "/home/nagakarthik/deepLearning/ivadomed_experiments/ivadomed/ivadomed/main.py", line 118, in get_dataset
    ds = imed_loader.load_dataset(bids_df, **{**loader_params, **{'data_list': data_lst,
  File "/home/nagakarthik/deepLearning/ivadomed_experiments/ivadomed/ivadomed/main.py", line 387, in run_command
    ds_valid = get_dataset(bids_df, loader_params, valid_lst, transform_valid_params, cuda_available, device,
  File "/home/nagakarthik/deepLearning/ivadomed_experiments/ivadomed/ivadomed/main.py", line 567, in run_main
    run_command(context=context,
  File "/home/nagakarthik/deepLearning/ivadomed_experiments/ivadomed/ivadomed/main.py", line 574, in <module> (Current frame)
    run_main()

Issue Analytics

State:
Created 2 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

dyt811commented, Nov 1, 2021

This is a good resources: https://blog.codinghorror.com/the-great-newline-schism/

Namely any files generated on windows and opened on linux or the reverse may present this problem when not obtained through git (git checkin/out takes care of this in the background).

0reactions

mariehbourgetcommented, Nov 2, 2021

Thanks for the config file, I don’t see anything that would require metadata.

When you say you tried index_metadata=False, you mean that you changed that particular argument in this line, right? Or, did you add an additional key-value parameter in your ivadomed/config/config.json file? I have not tried the latter but when changed that argument manually, it gave another error (already mentioned above) in this line.

I tried the same as you, adding index_metadata=False in this line, and it worked for me. So the TypeError cannot unpack non-iterable NoneType object may not be related to the metadata itself.