Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DecodeError: 'utf-8' codec can't decode byte 0x82 in position 1598601: invalid start byte when using PersistentDataset

See original GitHub issue

Hi,

While running a training that uses PersistentDataset, I’m getting a unicode error while it tries to load the cache files. This happens after a substantial number of epochs, so it runs through all the files several times before issuing this error. Every time I run the script, I delete all the contents of the cache directory folder and make sure there is nothing in it before re-running. The error doesn’t always happen in the same epoch.

IMPLEMENTATION DETAILS:

PersistentDataset is instanced with an input dictionary containing: a label and image file, a small numpy array, an integer and a string, all with separate keys. The transform is a simple Compose of LoadImageD loading specific keys (‘img’, ‘label’) of NPZ files saved locally.
While looping into this dataset, the cached files are created accordingly during the first epoch, and accessed during several epochs (~50, 100) until it errors out.

I wasn’t obtaining this error with older versions of Monai. Now it’s 0.8.1

Trace

for i, item in enumerate(self.source): File “######/data/spadenai_v2_sliced.py”, line 135, in getIteratorFun for volumes in dataset: File “######/venv/lib/python3.8/site-packages/monai/data/dataset.py”, line 97, in getitem return self._transform(index) File “######/venv/lib/python3.8/site-packages/monai/data/dataset.py”, line 364, in _transform pre_random_item = self._cachecheck(self.data[index]) File “######/venv/lib/python3.8/site-packages/monai/data/dataset.py”, line 330, in _cachecheck return torch.load(hashfile) File “######/venv/lib/python3.8/site-packages/torch/serialization.py”, line 607, in load return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args) File “######/venv/lib/python3.8/site-packages/torch/serialization.py”, line 882, in _load result = unpickler.load() UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x82 in position 1598601: invalid start byte

Environment This occurs in Ubuntu 18.04.6 LTS.

python -c 'import monai; monai.config.print_debug_info()'

Output:

Printing MONAI config…

MONAI version: 0.8.0 Numpy version: 1.19.4 Pytorch version: 1.10.1+cu102 MONAI flags: HAS_EXT = False, USE_COMPILED = False MONAI rev id: 714d00dffe6653e21260160666c4c201ab66511b

Optional dependencies: Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION. Nibabel version: 3.0.2 scikit-image version: 0.16.2 Pillow version: 7.1.2 Tensorboard version: 2.3.0 gdown version: NOT INSTALLED or UNKNOWN VERSION. TorchVision version: 0.11.2+cu102 tqdm version: 4.62.3 lmdb version: NOT INSTALLED or UNKNOWN VERSION. psutil version: NOT INSTALLED or UNKNOWN VERSION. pandas version: 1.0.1 einops version: 0.3.2 transformers version: NOT INSTALLED or UNKNOWN VERSION. mlflow version: 1.22.0

For details about installing the optional dependencies, please visit: https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies

================================ Printing system config…

psutil required for print_system_info

================================ Printing GPU config…

Num GPUs: 1 Has CUDA: True CUDA version: 10.2 cuDNN enabled: True cuDNN version: 7605 Current device: 0 Library compiled for CUDA architectures: [‘sm_37’, ‘sm_50’, ‘sm_60’, ‘sm_70’] GPU 0 Name: Quadro RTX 8000 GPU 0 Is integrated: False GPU 0 Is multi GPU board: False GPU 0 Multi processor count: 72 GPU 0 Total memory (GB): 47.5 GPU 0 CUDA capability (maj.min): 7.5

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

Nic-Macommented, May 18, 2022

Hi @virginiafdez ,

As I said in the previous comment, you can change the protocol by setting the MONAI dataset arg: https://github.com/Project-MONAI/MONAI/blob/dev/monai/data/dataset.py#L207 Please have a try first, if issue still exists, let’s analyze further.

Thanks.

0reactions

Nic-Macommented, May 24, 2022

Cool, glad to see that.

Thanks.