Unable to train using ivadomed because of a UnicodeDecodeError
See original GitHub issueIssue description
Unable to train a 2D segmentation model due to a UnicodeDecodeError occuring outside the scope of ivadomed.
Current behavior
I am trying to traing a model on a BIDSified dataset but am encountering a decoding error, in particular - UnicodeDecodeError 'utf-8' codec can't decode byte 0xc4 in position 1430: invalid continuation byte
. This is due to some non-ascii character in one of the json files, which is being flagged as an error while the indexing is done. I ran the debugger in order to pinpoint the source but since this due to some decoding error, the debugger’s final reach is until this line in loader/bids_dataframe.py
. This initially suggested that some parameters of BIDSLayout can be tweaked to avoid this error.
Things already tried
@mariehbourget suggested to consider looking at this independently without involving ivadomed by loading the files using the json package directly. I wrote a script to do this, and I found that there are 74 json files which are flagged as erroneous. However, each json file has ~1800 lines and I do not see any way to look through all those lines to find some single non-ascii character.
@jcohenadad Our hunch about using validate=False
(which is by default True) in the pybids.BIDSLayout
’s function call (see the line’s link above) did not work as I got the same error. I also added validate=False
in this line for the indexer, but still no luck.
Since I will only be using the .nii.gz
files, I thought that this error could be prevented if we don’t index any metadata in the first place. Therefore, I tried adding the argument index_metadata=False
(which was set to True
by default in this line for the BIDS layout indexer class. Now I am getting this error - TypeError cannot unpack non-iterable NoneType object
(possibly because “metadata” is now None
and it cannot be returned). Therefore, it seems that indexing the metadata is crucial for other ivadomed functionalities, and I cannot simply not index the metadata without breaking other function calls.
I am wondering whether there is any way to not index the metadata (json) files and load simply the .nii.gz
files for training? Any suggestions are highly appreciated!
(tagging @charleygros @andreanne-lemay in case they have experienced anything like this?)
Full error trace of the UnicodeDecodeError
Traceback (most recent call last):
File "ivadomed/ivadomed/main.py", line 574, in <module>
run_main()
File "ivadomed/ivadomed/main.py", line 567, in run_main
run_command(context=context,
File "ivadomed/ivadomed/main.py", line 352, in run_command
bids_df = BidsDataframe(loader_params, path_output, derivatives=True)
File "/home/nagakarthik/deepLearning/ivadomed_experiments/ivadomed/ivadomed/loader/bids_dataframe.py", line 65, in __init__
self.create_bids_dataframe()
File "/home/nagakarthik/deepLearning/ivadomed_experiments/ivadomed/ivadomed/loader/bids_dataframe.py", line 115, in create_bids_dataframe
layout = pybids.BIDSLayout(str(path_data), config=self.bids_config, indexer=indexer,
File "/home/nagakarthik/deepLearning/venvDL/lib/python3.8/site-packages/bids/layout/layout.py", line 156, in __init__
indexer(self)
File "/home/nagakarthik/deepLearning/venvDL/lib/python3.8/site-packages/bids/layout/index.py", line 112, in __call__
self._index_metadata()
File "/home/nagakarthik/deepLearning/venvDL/lib/python3.8/site-packages/bids/layout/index.py", line 380, in _index_metadata
file_md.update(pl())
File "/home/nagakarthik/deepLearning/venvDL/lib/python3.8/site-packages/bids/layout/index.py", line 275, in load_json
return json.load(handle)
File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 1430: invalid continuation byte
Error trace when index_metadata=False
File "/home/nagakarthik/deepLearning/ivadomed_experiments/ivadomed/ivadomed/loader/bids_dataset.py", line 103, in __init__
df_sub, roi_filename, target_filename, metadata = self.create_filename_pair(multichannel_subjects, subject,
File "/home/nagakarthik/deepLearning/ivadomed_experiments/ivadomed/ivadomed/loader/loader.py", line 91, in load_dataset
dataset = BidsDataset(bids_df=bids_df,
File "/home/nagakarthik/deepLearning/ivadomed_experiments/ivadomed/ivadomed/main.py", line 118, in get_dataset
ds = imed_loader.load_dataset(bids_df, **{**loader_params, **{'data_list': data_lst,
File "/home/nagakarthik/deepLearning/ivadomed_experiments/ivadomed/ivadomed/main.py", line 387, in run_command
ds_valid = get_dataset(bids_df, loader_params, valid_lst, transform_valid_params, cuda_available, device,
File "/home/nagakarthik/deepLearning/ivadomed_experiments/ivadomed/ivadomed/main.py", line 567, in run_main
run_command(context=context,
File "/home/nagakarthik/deepLearning/ivadomed_experiments/ivadomed/ivadomed/main.py", line 574, in <module> (Current frame)
run_main()
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
This is a good resources: https://blog.codinghorror.com/the-great-newline-schism/
Namely any files generated on windows and opened on linux or the reverse may present this problem when not obtained through git (git checkin/out takes care of this in the background).
Thanks for the config file, I don’t see anything that would require metadata.
I tried the same as you, adding
index_metadata=False
in this line, and it worked for me. So theTypeError cannot unpack non-iterable NoneType object
may not be related to the metadata itself.