question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] ValueError: Cannot load file containing pickled data when allow_pickle=False

See original GitHub issue

Describe the bug

I had training tacotron 2 for a while and now I want to add sample audio for one speaker. When I run using

CUDA_VISIBLE_DEVICES=0 python train.py --continue_path /media/DATA-2/TTS/TTS_Coqui/TTS/running-July-28-2022_09+54AM-68cef28a

I got error like this:

 > Number of output frames: 2

 > EPOCH: 0/1000
 --> /media/DATA-2/TTS/TTS_Coqui/TTS-July-28-2022_09+54AM-68cef28a


> DataLoader initialization
| > Tokenizer:
	| > add_blank: False
	| > use_eos_bos: False
	| > use_phonemes: True
	| > phonemizer:
		| > phoneme language: en-us
		| > phoneme backend: gruut
| > Number of instances : 23359
 | > Preprocessing samples
 | > Max text length: 239
 | > Min text length: 4
 | > Avg text length: 86.08806027655294
 | 
 | > Max audio length: 1145718.0
 | > Min audio length: 11868.0
 | > Avg audio length: 519904.13767712656
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.

 > TRAINING (2022-09-01 11:28:31) 
/media/DATA-2/TTS/TTS_Coqui/TTS/TTS/tts/models/tacotron2.py:333: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  ) // self.decoder.r
/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2228.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/media/DATA-2/TTS/TTS_Coqui/TTS/TTS/tts/models/tacotron2.py:335: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  alignment_lengths = mel_lengths // self.decoder.r

   --> STEP: 9/5840 -- GLOBAL_STEP: 1690010
     | > decoder_loss: 1.35190  (2.06165)
     | > postnet_loss: 1.23185  (1.89519)
     | > stopnet_loss: 0.45206  (0.54466)
     | > decoder_coarse_loss: 1.96557  (2.80050)
     | > decoder_ddc_loss: 0.05431  (0.06398)
     | > ga_loss: 0.00554  (0.01036)
     | > decoder_diff_spec_loss: 0.46238  (0.58947)
     | > postnet_diff_spec_loss: 0.40906  (0.52605)
     | > decoder_ssim_loss: 0.48877  (0.48201)
     | > postnet_ssim_loss: 0.45778  (0.45322)
     | > loss: 2.08516  (2.81450)
     | > align_error: 0.38218  (0.36455)
     | > grad_norm: 11.03733  (13.36171)
     | > current_lr: 0.00000 
     | > step_time: 0.16360  (0.17053)
     | > loader_time: 0.00130  (0.00129)


   --> STEP: 19/5840 -- GLOBAL_STEP: 1690020
     | > decoder_loss: 1.26435  (2.00329)
     | > postnet_loss: 1.14596  (1.83944)
     | > stopnet_loss: 0.15051  (0.49044)
     | > decoder_coarse_loss: 1.96471  (2.79364)
     | > decoder_ddc_loss: 0.03852  (0.05443)
     | > ga_loss: 0.00158  (0.00696)
     | > decoder_diff_spec_loss: 0.44740  (0.57787)
     | > postnet_diff_spec_loss: 0.39480  (0.51306)
     | > decoder_ssim_loss: 0.43631  (0.47875)
     | > postnet_ssim_loss: 0.40454  (0.44884)
     | > loss: 1.68256  (2.70255)
     | > align_error: 0.32000  (0.36616)
     | > grad_norm: 6.11971  (12.52853)
     | > current_lr: 0.00000 
     | > step_time: 0.22500  (0.19586)
     | > loader_time: 0.00150  (0.00125)

 ! Run is kept in /media/DATA-2/TTS/TTS_Coqui/TTS-July-28-2022_09+54AM-68cef28a
Traceback (most recent call last):
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/trainer/trainer.py", line 1492, in fit
    self._fit()
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/trainer/trainer.py", line 1476, in _fit
    self.train_epoch()
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/trainer/trainer.py", line 1254, in train_epoch
    for cur_step, batch in enumerate(self.train_loader):
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
    data = self._next_data()
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
    return self._process_data(data)
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
    data.reraise()
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/torch/_utils.py", line 457, in reraise
    raise exception
ValueError: Caught ValueError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/media/DATA-2/TTS/TTS_Coqui/TTS/TTS/tts/datasets/dataset.py", line 180, in __getitem__
    return self.load_data(idx)
  File "/media/DATA-2/TTS/TTS_Coqui/TTS/TTS/tts/datasets/dataset.py", line 230, in load_data
    token_ids = self.get_token_ids(idx, item["text"])
  File "/media/DATA-2/TTS/TTS_Coqui/TTS/TTS/tts/datasets/dataset.py", line 213, in get_token_ids
    token_ids = self.get_phonemes(idx, text)["token_ids"]
  File "/media/DATA-2/TTS/TTS_Coqui/TTS/TTS/tts/datasets/dataset.py", line 196, in get_phonemes
    out_dict = self.phoneme_dataset[idx]
  File "/media/DATA-2/TTS/TTS_Coqui/TTS/TTS/tts/datasets/dataset.py", line 563, in __getitem__
    ids = self.compute_or_load(item["audio_file"], item["text"])
  File "/media/DATA-2/TTS/TTS_Coqui/TTS/TTS/tts/datasets/dataset.py", line 579, in compute_or_load
    ids = np.load(cache_path)
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/numpy/lib/npyio.py", line 445, in load
    raise ValueError("Cannot load file containing pickled data "
ValueError: Cannot load file containing pickled data when allow_pickle=False

Environment

{
"CUDA": {
"GPU": [
"NVIDIA GeForce GTX 1660 Ti"
],
"available": true,
"version": "10.2"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "1.11.0+cu102",
"TTS": "0.6.1",
"numpy": "1.19.5"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
"ELF"
],
"processor": "x86_64",
"python": "3.8.0",
"version": "#118~18.04.1-Ubuntu SMP Thu Mar 3 13:53:15 UTC 2022"
}
}

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
kin0303commented, Sep 9, 2022

I am done with this problem. If you meet this problem you can do this way:

  1. Maybe move the cache folder temporary to different location and let it rebuild.
  2. add allow_pickle=True in np.load(cache_path), like np.load(cache_path , allow_pickle=True) at /media/DATA-2/TTS/TTS_Coqui/TTS/TTS/tts/datasets/dataset.py", line 579
  3. Or you can read these issues https://github.com/coqui-ai/TTS/issues/1624
0reactions
kin0303commented, Sep 2, 2022

@blackmamba1122 Looks like some of the phonemes cached are corrupted. You need to delete the cache directory or change the phonemes cache directory (“phoneme_cache_path” parameter on config) forcing the TTS to recompute it.

I’ll try this one and I report it back

Still error

   --> STEP: 1209/5840 -- GLOBAL_STEP: 1691210
     | > decoder_loss: 0.58440  (0.80216)
     | > postnet_loss: 0.51266  (0.72178)
     | > stopnet_loss: 0.84992  (0.29996)
     | > decoder_coarse_loss: 0.89247  (1.24166)
     | > decoder_ddc_loss: 0.00162  (0.00863)
     | > ga_loss: 0.00004  (0.00034)
     | > decoder_diff_spec_loss: 0.37415  (0.39893)
     | > postnet_diff_spec_loss: 0.33291  (0.35312)
     | > decoder_ssim_loss: 0.12760  (0.25786)
     | > postnet_ssim_loss: 0.11682  (0.23833)
     | > loss: 1.58579  (1.30728)
     | > align_error: 0.60102  (0.41433)
     | > grad_norm: 4.16512  (4.11554)
     | > current_lr: 0.00000 
     | > step_time: 2.44260  (1.16248)
     | > loader_time: 0.00260  (0.00190)

 ! Run is kept in /media/DATA-2/TTS/TTS_Coqui/TTS-July-28-2022_09+54AM-68cef28a
Traceback (most recent call last):
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/trainer/trainer.py", line 1492, in fit
    self._fit()
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/trainer/trainer.py", line 1476, in _fit
    self.train_epoch()
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/trainer/trainer.py", line 1254, in train_epoch
    for cur_step, batch in enumerate(self.train_loader):
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
    data = self._next_data()
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1204, in _next_data
    return self._process_data(data)
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
    data.reraise()
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/torch/_utils.py", line 457, in reraise
    raise exception
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/media/DATA-2/TTS/TTS_Coqui/coqui_env/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/media/DATA-2/TTS/TTS_Coqui/TTS/TTS/tts/datasets/dataset.py", line 180, in __getitem__
    return self.load_data(idx)
  File "/media/DATA-2/TTS/TTS_Coqui/TTS/TTS/tts/datasets/dataset.py", line 230, in load_data
    token_ids = self.get_token_ids(idx, item["text"])
  File "/media/DATA-2/TTS/TTS_Coqui/TTS/TTS/tts/datasets/dataset.py", line 213, in get_token_ids
    token_ids = self.get_phonemes(idx, text)["token_ids"]
  File "/media/DATA-2/TTS/TTS_Coqui/TTS/TTS/tts/datasets/dataset.py", line 198, in get_phonemes
    assert len(out_dict["token_ids"]) > 0
AssertionError
Read more comments on GitHub >

github_iconTop Results From Across the Web

Cannot load file containing pickled data - Python .npy I/O
Warning: Loading files that contain object arrays uses the pickle module, which is not secure against erroneous or maliciously constructed data.
Read more >
Cannot load file containing pickled data when allow_pickle ...
When i try to run the following code, it gives me this error. 'ValueError: Cannot load file containing pickled data when allow_pickle=False'.
Read more >
ValueError: Cannot load file containing pickled data when ...
This appears to be a bug with the 3DFSC job, and we were able to reproduce the error. Note that the error is...
Read more >
Cannot load file containing pickled data - Python .npy I/O-numpy
Warning: Loading files that contain object arrays uses the pickle module, which is not secure against erroneous or maliciously constructed data. Consider ...
Read more >
numpy.load — NumPy v1.25.dev0 Manual
If allow_pickle=True , but the file cannot be loaded as a pickle. ValueError. The file contains an object array, but allow_pickle=False given. See...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found