question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Loading sampler state dict error

See original GitHub issue

I resume to train wenetspeech model by specifying --start-batch 24000, related code is listed below:

python3 ./pruned_transducer_stateless5/train.py \
  --start-batch 24000 \
  --use-fp16 True \
  --lang-dir data/lang_char \
  --exp-dir pruned_transducer_stateless5/exp_L_streaming \
  --world-size 8 \
  --num-epochs 15 \
  --start-epoch 1 \
  --max-duration 140 \
  --valid-interval 3000 \
  --model-warm-step 3000 \
  --save-every-n 2000 \
  --average-period 1000 \
  --training-subset L \
  --dynamic-chunk-training True \
  --causal-convolution True \
  --short-chunk-size 25 \
  --num-left-chunks 4

The train_dl.sampler.load_state_dict(sampler_state_dict) takes about three hours, and then I get the following error when loading:

2022-08-08 16:47:14,334 INFO [train.py:943] (0/8) Training started
2022-08-08 16:47:14,334 INFO [train.py:943] (7/8) Training started
2022-08-08 16:47:14,334 INFO [train.py:953] (7/8) Device: cuda:7
2022-08-08 16:47:14,335 INFO [train.py:943] (5/8) Training started
2022-08-08 16:47:14,336 INFO [train.py:953] (5/8) Device: cuda:5
2022-08-08 16:47:14,337 INFO [train.py:953] (0/8) Device: cuda:0
2022-08-08 16:47:14,340 INFO [train.py:943] (6/8) Training started
2022-08-08 16:47:14,340 INFO [train.py:953] (6/8) Device: cuda:6
2022-08-08 16:47:14,340 INFO [train.py:943] (1/8) Training started
2022-08-08 16:47:14,341 INFO [train.py:953] (1/8) Device: cuda:1
2022-08-08 16:47:14,341 INFO [train.py:943] (4/8) Training started
2022-08-08 16:47:14,341 INFO [train.py:953] (4/8) Device: cuda:4
2022-08-08 16:47:14,342 INFO [train.py:943] (2/8) Training started
2022-08-08 16:47:14,342 INFO [train.py:953] (2/8) Device: cuda:2
2022-08-08 16:47:14,342 INFO [train.py:943] (3/8) Training started
2022-08-08 16:47:14,343 INFO [train.py:953] (3/8) Device: cuda:3
2022-08-08 16:47:16,459 INFO [lexicon.py:176] (2/8) Loading pre-compiled data/lang_char/Linv.pt
2022-08-08 16:47:16,476 INFO [lexicon.py:176] (6/8) Loading pre-compiled data/lang_char/Linv.pt
2022-08-08 16:47:16,547 INFO [lexicon.py:176] (0/8) Loading pre-compiled data/lang_char/Linv.pt
2022-08-08 16:47:16,552 INFO [lexicon.py:176] (3/8) Loading pre-compiled data/lang_char/Linv.pt
2022-08-08 16:47:16,554 INFO [lexicon.py:176] (4/8) Loading pre-compiled data/lang_char/Linv.pt
2022-08-08 16:47:16,555 INFO [lexicon.py:176] (1/8) Loading pre-compiled data/lang_char/Linv.pt
2022-08-08 16:47:16,563 INFO [lexicon.py:176] (7/8) Loading pre-compiled data/lang_char/Linv.pt
2022-08-08 16:47:16,576 INFO [lexicon.py:176] (5/8) Loading pre-compiled data/lang_char/Linv.pt
2022-08-08 16:47:16,686 INFO [train.py:969] (2/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'feature_dim': 80, 'subsampling_factor': 4, 'env_info': {'k2-version': '1.17', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '7dcabf85e8bf06984c4abab0400ef1322b5ff3df', 'k2-git-date': 'Tue Aug 2 21:22:39 2022', 'lhotse-version': '1.5.0.dev+git.08a613a.clean', 'torch-version': '1.12.0', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'f24b76e-dirty', 'icefall-git-date': 'Sat Aug 6 18:33:43 2022', 'icefall-path': '/home/storage04/zhuangweiji/workspace/kaldi2/icefall', 'k2-path': '/home/storage04/zhuangweiji/tools/anaconda3/envs/k2-py39-cuda10.2-torch1.12/lib/python3.9/site-packages/k2-1.17.dev20220803+cuda10.2.torch1.12.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/home/storage04/zhuangweiji/workspace/kaldi2/lhotse/lhotse/__init__.py', 'hostname': 'tj1-asr-train-v100-01.kscn', 'IP address': '10.38.10.45'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 15, 'start_epoch': 1, 'start_batch': 24000, 'exp_dir': PosixPath('pruned_transducer_stateless5/exp_L_streaming'), 'lang_dir': PosixPath('data/lang_char'), 'initial_lr': 0.003, 'lr_batches': 5000, 'lr_epochs': 6, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'save_every_n': 2000, 'keep_last_k': 30, 'average_period': 1000, 'use_fp16': True, 'valid_interval': 3000, 'model_warm_step': 3000, 'num_encoder_layers': 24, 'dim_feedforward': 1536, 'nhead': 8, 'encoder_dim': 384, 'decoder_dim': 512, 'joiner_dim': 512, 'dynamic_chunk_training': True, 'causal_convolution': True, 'short_chunk_size': 25, 'num_left_chunks': 4, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 140, 'bucketing_sampler': True, 'num_buckets': 300, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'training_subset': 'L', 'blank_id': 0, 'vocab_size': 5537}
2022-08-08 16:47:16,686 INFO [train.py:971] (2/8) About to create model
2022-08-08 16:47:16,697 INFO [train.py:969] (6/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'feature_dim': 80, 'subsampling_factor': 4, 'env_info': {'k2-version': '1.17', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '7dcabf85e8bf06984c4abab0400ef1322b5ff3df', 'k2-git-date': 'Tue Aug 2 21:22:39 2022', 'lhotse-version': '1.5.0.dev+git.08a613a.clean', 'torch-version': '1.12.0', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'f24b76e-dirty', 'icefall-git-date': 'Sat Aug 6 18:33:43 2022', 'icefall-path': '/home/storage04/zhuangweiji/workspace/kaldi2/icefall', 'k2-path': '/home/storage04/zhuangweiji/tools/anaconda3/envs/k2-py39-cuda10.2-torch1.12/lib/python3.9/site-packages/k2-1.17.dev20220803+cuda10.2.torch1.12.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/home/storage04/zhuangweiji/workspace/kaldi2/lhotse/lhotse/__init__.py', 'hostname': 'tj1-asr-train-v100-01.kscn', 'IP address': '10.38.10.45'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 15, 'start_epoch': 1, 'start_batch': 24000, 'exp_dir': PosixPath('pruned_transducer_stateless5/exp_L_streaming'), 'lang_dir': PosixPath('data/lang_char'), 'initial_lr': 0.003, 'lr_batches': 5000, 'lr_epochs': 6, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'save_every_n': 2000, 'keep_last_k': 30, 'average_period': 1000, 'use_fp16': True, 'valid_interval': 3000, 'model_warm_step': 3000, 'num_encoder_layers': 24, 'dim_feedforward': 1536, 'nhead': 8, 'encoder_dim': 384, 'decoder_dim': 512, 'joiner_dim': 512, 'dynamic_chunk_training': True, 'causal_convolution': True, 'short_chunk_size': 25, 'num_left_chunks': 4, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 140, 'bucketing_sampler': True, 'num_buckets': 300, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'training_subset': 'L', 'blank_id': 0, 'vocab_size': 5537}
2022-08-08 16:47:16,698 INFO [train.py:971] (6/8) About to create model
2022-08-08 16:47:16,771 INFO [train.py:969] (4/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'feature_dim': 80, 'subsampling_factor': 4, 'env_info': {'k2-version': '1.17', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '7dcabf85e8bf06984c4abab0400ef1322b5ff3df', 'k2-git-date': 'Tue Aug 2 21:22:39 2022', 'lhotse-version': '1.5.0.dev+git.08a613a.clean', 'torch-version': '1.12.0', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'f24b76e-dirty', 'icefall-git-date': 'Sat Aug 6 18:33:43 2022', 'icefall-path': '/home/storage04/zhuangweiji/workspace/kaldi2/icefall', 'k2-path': '/home/storage04/zhuangweiji/tools/anaconda3/envs/k2-py39-cuda10.2-torch1.12/lib/python3.9/site-packages/k2-1.17.dev20220803+cuda10.2.torch1.12.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/home/storage04/zhuangweiji/workspace/kaldi2/lhotse/lhotse/__init__.py', 'hostname': 'tj1-asr-train-v100-01.kscn', 'IP address': '10.38.10.45'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 15, 'start_epoch': 1, 'start_batch': 24000, 'exp_dir': PosixPath('pruned_transducer_stateless5/exp_L_streaming'), 'lang_dir': PosixPath('data/lang_char'), 'initial_lr': 0.003, 'lr_batches': 5000, 'lr_epochs': 6, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'save_every_n': 2000, 'keep_last_k': 30, 'average_period': 1000, 'use_fp16': True, 'valid_interval': 3000, 'model_warm_step': 3000, 'num_encoder_layers': 24, 'dim_feedforward': 1536, 'nhead': 8, 'encoder_dim': 384, 'decoder_dim': 512, 'joiner_dim': 512, 'dynamic_chunk_training': True, 'causal_convolution': True, 'short_chunk_size': 25, 'num_left_chunks': 4, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 140, 'bucketing_sampler': True, 'num_buckets': 300, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'training_subset': 'L', 'blank_id': 0, 'vocab_size': 5537}
2022-08-08 16:47:16,772 INFO [train.py:971] (4/8) About to create model
2022-08-08 16:47:16,775 INFO [train.py:969] (3/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'feature_dim': 80, 'subsampling_factor': 4, 'env_info': {'k2-version': '1.17', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '7dcabf85e8bf06984c4abab0400ef1322b5ff3df', 'k2-git-date': 'Tue Aug 2 21:22:39 2022', 'lhotse-version': '1.5.0.dev+git.08a613a.clean', 'torch-version': '1.12.0', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'f24b76e-dirty', 'icefall-git-date': 'Sat Aug 6 18:33:43 2022', 'icefall-path': '/home/storage04/zhuangweiji/workspace/kaldi2/icefall', 'k2-path': '/home/storage04/zhuangweiji/tools/anaconda3/envs/k2-py39-cuda10.2-torch1.12/lib/python3.9/site-packages/k2-1.17.dev20220803+cuda10.2.torch1.12.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/home/storage04/zhuangweiji/workspace/kaldi2/lhotse/lhotse/__init__.py', 'hostname': 'tj1-asr-train-v100-01.kscn', 'IP address': '10.38.10.45'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 15, 'start_epoch': 1, 'start_batch': 24000, 'exp_dir': PosixPath('pruned_transducer_stateless5/exp_L_streaming'), 'lang_dir': PosixPath('data/lang_char'), 'initial_lr': 0.003, 'lr_batches': 5000, 'lr_epochs': 6, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'save_every_n': 2000, 'keep_last_k': 30, 'average_period': 1000, 'use_fp16': True, 'valid_interval': 3000, 'model_warm_step': 3000, 'num_encoder_layers': 24, 'dim_feedforward': 1536, 'nhead': 8, 'encoder_dim': 384, 'decoder_dim': 512, 'joiner_dim': 512, 'dynamic_chunk_training': True, 'causal_convolution': True, 'short_chunk_size': 25, 'num_left_chunks': 4, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 140, 'bucketing_sampler': True, 'num_buckets': 300, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'training_subset': 'L', 'blank_id': 0, 'vocab_size': 5537}
2022-08-08 16:47:16,775 INFO [train.py:971] (3/8) About to create model
2022-08-08 16:47:16,775 INFO [train.py:969] (1/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'feature_dim': 80, 'subsampling_factor': 4, 'env_info': {'k2-version': '1.17', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '7dcabf85e8bf06984c4abab0400ef1322b5ff3df', 'k2-git-date': 'Tue Aug 2 21:22:39 2022', 'lhotse-version': '1.5.0.dev+git.08a613a.clean', 'torch-version': '1.12.0', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'f24b76e-dirty', 'icefall-git-date': 'Sat Aug 6 18:33:43 2022', 'icefall-path': '/home/storage04/zhuangweiji/workspace/kaldi2/icefall', 'k2-path': '/home/storage04/zhuangweiji/tools/anaconda3/envs/k2-py39-cuda10.2-torch1.12/lib/python3.9/site-packages/k2-1.17.dev20220803+cuda10.2.torch1.12.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/home/storage04/zhuangweiji/workspace/kaldi2/lhotse/lhotse/__init__.py', 'hostname': 'tj1-asr-train-v100-01.kscn', 'IP address': '10.38.10.45'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 15, 'start_epoch': 1, 'start_batch': 24000, 'exp_dir': PosixPath('pruned_transducer_stateless5/exp_L_streaming'), 'lang_dir': PosixPath('data/lang_char'), 'initial_lr': 0.003, 'lr_batches': 5000, 'lr_epochs': 6, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'save_every_n': 2000, 'keep_last_k': 30, 'average_period': 1000, 'use_fp16': True, 'valid_interval': 3000, 'model_warm_step': 3000, 'num_encoder_layers': 24, 'dim_feedforward': 1536, 'nhead': 8, 'encoder_dim': 384, 'decoder_dim': 512, 'joiner_dim': 512, 'dynamic_chunk_training': True, 'causal_convolution': True, 'short_chunk_size': 25, 'num_left_chunks': 4, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 140, 'bucketing_sampler': True, 'num_buckets': 300, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'training_subset': 'L', 'blank_id': 0, 'vocab_size': 5537}
2022-08-08 16:47:16,776 INFO [train.py:971] (1/8) About to create model
2022-08-08 16:47:16,776 INFO [train.py:969] (0/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'feature_dim': 80, 'subsampling_factor': 4, 'env_info': {'k2-version': '1.17', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '7dcabf85e8bf06984c4abab0400ef1322b5ff3df', 'k2-git-date': 'Tue Aug 2 21:22:39 2022', 'lhotse-version': '1.5.0.dev+git.08a613a.clean', 'torch-version': '1.12.0', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'f24b76e-dirty', 'icefall-git-date': 'Sat Aug 6 18:33:43 2022', 'icefall-path': '/home/storage04/zhuangweiji/workspace/kaldi2/icefall', 'k2-path': '/home/storage04/zhuangweiji/tools/anaconda3/envs/k2-py39-cuda10.2-torch1.12/lib/python3.9/site-packages/k2-1.17.dev20220803+cuda10.2.torch1.12.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/home/storage04/zhuangweiji/workspace/kaldi2/lhotse/lhotse/__init__.py', 'hostname': 'tj1-asr-train-v100-01.kscn', 'IP address': '10.38.10.45'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 15, 'start_epoch': 1, 'start_batch': 24000, 'exp_dir': PosixPath('pruned_transducer_stateless5/exp_L_streaming'), 'lang_dir': PosixPath('data/lang_char'), 'initial_lr': 0.003, 'lr_batches': 5000, 'lr_epochs': 6, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'save_every_n': 2000, 'keep_last_k': 30, 'average_period': 1000, 'use_fp16': True, 'valid_interval': 3000, 'model_warm_step': 3000, 'num_encoder_layers': 24, 'dim_feedforward': 1536, 'nhead': 8, 'encoder_dim': 384, 'decoder_dim': 512, 'joiner_dim': 512, 'dynamic_chunk_training': True, 'causal_convolution': True, 'short_chunk_size': 25, 'num_left_chunks': 4, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 140, 'bucketing_sampler': True, 'num_buckets': 300, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'training_subset': 'L', 'blank_id': 0, 'vocab_size': 5537}
2022-08-08 16:47:16,776 INFO [train.py:971] (0/8) About to create model
2022-08-08 16:47:16,787 INFO [train.py:969] (7/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'feature_dim': 80, 'subsampling_factor': 4, 'env_info': {'k2-version': '1.17', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '7dcabf85e8bf06984c4abab0400ef1322b5ff3df', 'k2-git-date': 'Tue Aug 2 21:22:39 2022', 'lhotse-version': '1.5.0.dev+git.08a613a.clean', 'torch-version': '1.12.0', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'f24b76e-dirty', 'icefall-git-date': 'Sat Aug 6 18:33:43 2022', 'icefall-path': '/home/storage04/zhuangweiji/workspace/kaldi2/icefall', 'k2-path': '/home/storage04/zhuangweiji/tools/anaconda3/envs/k2-py39-cuda10.2-torch1.12/lib/python3.9/site-packages/k2-1.17.dev20220803+cuda10.2.torch1.12.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/home/storage04/zhuangweiji/workspace/kaldi2/lhotse/lhotse/__init__.py', 'hostname': 'tj1-asr-train-v100-01.kscn', 'IP address': '10.38.10.45'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 15, 'start_epoch': 1, 'start_batch': 24000, 'exp_dir': PosixPath('pruned_transducer_stateless5/exp_L_streaming'), 'lang_dir': PosixPath('data/lang_char'), 'initial_lr': 0.003, 'lr_batches': 5000, 'lr_epochs': 6, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'save_every_n': 2000, 'keep_last_k': 30, 'average_period': 1000, 'use_fp16': True, 'valid_interval': 3000, 'model_warm_step': 3000, 'num_encoder_layers': 24, 'dim_feedforward': 1536, 'nhead': 8, 'encoder_dim': 384, 'decoder_dim': 512, 'joiner_dim': 512, 'dynamic_chunk_training': True, 'causal_convolution': True, 'short_chunk_size': 25, 'num_left_chunks': 4, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 140, 'bucketing_sampler': True, 'num_buckets': 300, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'training_subset': 'L', 'blank_id': 0, 'vocab_size': 5537}
2022-08-08 16:47:16,788 INFO [train.py:971] (7/8) About to create model
2022-08-08 16:47:16,800 INFO [train.py:969] (5/8) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'feature_dim': 80, 'subsampling_factor': 4, 'env_info': {'k2-version': '1.17', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '7dcabf85e8bf06984c4abab0400ef1322b5ff3df', 'k2-git-date': 'Tue Aug 2 21:22:39 2022', 'lhotse-version': '1.5.0.dev+git.08a613a.clean', 'torch-version': '1.12.0', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'f24b76e-dirty', 'icefall-git-date': 'Sat Aug 6 18:33:43 2022', 'icefall-path': '/home/storage04/zhuangweiji/workspace/kaldi2/icefall', 'k2-path': '/home/storage04/zhuangweiji/tools/anaconda3/envs/k2-py39-cuda10.2-torch1.12/lib/python3.9/site-packages/k2-1.17.dev20220803+cuda10.2.torch1.12.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/home/storage04/zhuangweiji/workspace/kaldi2/lhotse/lhotse/__init__.py', 'hostname': 'tj1-asr-train-v100-01.kscn', 'IP address': '10.38.10.45'}, 'world_size': 8, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 15, 'start_epoch': 1, 'start_batch': 24000, 'exp_dir': PosixPath('pruned_transducer_stateless5/exp_L_streaming'), 'lang_dir': PosixPath('data/lang_char'), 'initial_lr': 0.003, 'lr_batches': 5000, 'lr_epochs': 6, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'save_every_n': 2000, 'keep_last_k': 30, 'average_period': 1000, 'use_fp16': True, 'valid_interval': 3000, 'model_warm_step': 3000, 'num_encoder_layers': 24, 'dim_feedforward': 1536, 'nhead': 8, 'encoder_dim': 384, 'decoder_dim': 512, 'joiner_dim': 512, 'dynamic_chunk_training': True, 'causal_convolution': True, 'short_chunk_size': 25, 'num_left_chunks': 4, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 140, 'bucketing_sampler': True, 'num_buckets': 300, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'training_subset': 'L', 'blank_id': 0, 'vocab_size': 5537}
2022-08-08 16:47:16,800 INFO [train.py:971] (5/8) About to create model
2022-08-08 16:47:17,336 INFO [train.py:975] (2/8) Number of model parameters: 97487351
2022-08-08 16:47:17,337 INFO [checkpoint.py:112] (2/8) Loading checkpoint from pruned_transducer_stateless5/exp_L_streaming/checkpoint-24000.pt
2022-08-08 16:47:17,349 INFO [train.py:975] (6/8) Number of model parameters: 97487351
2022-08-08 16:47:17,350 INFO [checkpoint.py:112] (6/8) Loading checkpoint from pruned_transducer_stateless5/exp_L_streaming/checkpoint-24000.pt
2022-08-08 16:47:17,449 INFO [train.py:975] (0/8) Number of model parameters: 97487351
2022-08-08 16:47:17,450 INFO [train.py:975] (3/8) Number of model parameters: 97487351
2022-08-08 16:47:17,450 INFO [checkpoint.py:112] (3/8) Loading checkpoint from pruned_transducer_stateless5/exp_L_streaming/checkpoint-24000.pt
2022-08-08 16:47:17,450 INFO [train.py:975] (1/8) Number of model parameters: 97487351
2022-08-08 16:47:17,450 INFO [checkpoint.py:112] (1/8) Loading checkpoint from pruned_transducer_stateless5/exp_L_streaming/checkpoint-24000.pt
2022-08-08 16:47:17,453 INFO [train.py:975] (4/8) Number of model parameters: 97487351
2022-08-08 16:47:17,454 INFO [checkpoint.py:112] (4/8) Loading checkpoint from pruned_transducer_stateless5/exp_L_streaming/checkpoint-24000.pt
2022-08-08 16:47:17,472 INFO [train.py:975] (7/8) Number of model parameters: 97487351
2022-08-08 16:47:17,473 INFO [checkpoint.py:112] (7/8) Loading checkpoint from pruned_transducer_stateless5/exp_L_streaming/checkpoint-24000.pt
2022-08-08 16:47:17,475 INFO [train.py:975] (5/8) Number of model parameters: 97487351
2022-08-08 16:47:17,476 INFO [checkpoint.py:112] (5/8) Loading checkpoint from pruned_transducer_stateless5/exp_L_streaming/checkpoint-24000.pt
2022-08-08 16:47:17,856 INFO [checkpoint.py:112] (0/8) Loading checkpoint from pruned_transducer_stateless5/exp_L_streaming/checkpoint-24000.pt
2022-08-08 16:47:19,757 INFO [checkpoint.py:131] (0/8) Loading averaged model
2022-08-08 16:47:23,825 INFO [train.py:990] (0/8) Using DDP
2022-08-08 16:47:24,070 INFO [train.py:990] (4/8) Using DDP
2022-08-08 16:47:24,094 INFO [train.py:990] (2/8) Using DDP
2022-08-08 16:47:24,104 INFO [train.py:990] (7/8) Using DDP
2022-08-08 16:47:24,207 INFO [train.py:990] (1/8) Using DDP
2022-08-08 16:47:24,213 INFO [train.py:990] (6/8) Using DDP
2022-08-08 16:47:24,314 INFO [train.py:990] (3/8) Using DDP
2022-08-08 16:47:24,378 INFO [train.py:990] (5/8) Using DDP
2022-08-08 16:47:25,337 INFO [train.py:998] (0/8) Loading optimizer state dict
2022-08-08 16:47:25,343 INFO [train.py:998] (2/8) Loading optimizer state dict
2022-08-08 16:47:25,344 INFO [train.py:998] (1/8) Loading optimizer state dict
2022-08-08 16:47:25,344 INFO [train.py:998] (5/8) Loading optimizer state dict
2022-08-08 16:47:25,344 INFO [train.py:998] (6/8) Loading optimizer state dict
2022-08-08 16:47:25,345 INFO [train.py:998] (3/8) Loading optimizer state dict
2022-08-08 16:47:25,345 INFO [train.py:998] (7/8) Loading optimizer state dict
2022-08-08 16:47:25,345 INFO [train.py:998] (4/8) Loading optimizer state dict
2022-08-08 16:47:26,359 INFO [train.py:1006] (0/8) Loading scheduler state dict
2022-08-08 16:47:26,360 INFO [asr_datamodule.py:415] (0/8) About to get train cuts
2022-08-08 16:47:26,367 INFO [asr_datamodule.py:424] (0/8) About to get dev cuts
2022-08-08 16:47:26,369 INFO [asr_datamodule.py:347] (0/8) About to create dev dataset
2022-08-08 16:47:26,411 INFO [train.py:1006] (6/8) Loading scheduler state dict
2022-08-08 16:47:26,411 INFO [asr_datamodule.py:415] (6/8) About to get train cuts
2022-08-08 16:47:26,414 INFO [asr_datamodule.py:424] (6/8) About to get dev cuts
2022-08-08 16:47:26,415 INFO [asr_datamodule.py:347] (6/8) About to create dev dataset
2022-08-08 16:47:26,494 INFO [train.py:1006] (2/8) Loading scheduler state dict
2022-08-08 16:47:26,495 INFO [asr_datamodule.py:415] (2/8) About to get train cuts
2022-08-08 16:47:26,494 INFO [train.py:1006] (4/8) Loading scheduler state dict
2022-08-08 16:47:26,495 INFO [asr_datamodule.py:415] (4/8) About to get train cuts
2022-08-08 16:47:26,498 INFO [asr_datamodule.py:424] (2/8) About to get dev cuts
2022-08-08 16:47:26,498 INFO [asr_datamodule.py:424] (4/8) About to get dev cuts
2022-08-08 16:47:26,499 INFO [asr_datamodule.py:347] (2/8) About to create dev dataset
2022-08-08 16:47:26,499 INFO [asr_datamodule.py:347] (4/8) About to create dev dataset
2022-08-08 16:47:26,569 INFO [train.py:1006] (7/8) Loading scheduler state dict
2022-08-08 16:47:26,570 INFO [asr_datamodule.py:415] (7/8) About to get train cuts
2022-08-08 16:47:26,574 INFO [asr_datamodule.py:424] (7/8) About to get dev cuts
2022-08-08 16:47:26,575 INFO [asr_datamodule.py:347] (7/8) About to create dev dataset
2022-08-08 16:47:26,593 INFO [train.py:1006] (3/8) Loading scheduler state dict
2022-08-08 16:47:26,593 INFO [asr_datamodule.py:415] (3/8) About to get train cuts
2022-08-08 16:47:26,595 INFO [asr_datamodule.py:424] (3/8) About to get dev cuts
2022-08-08 16:47:26,596 INFO [asr_datamodule.py:347] (3/8) About to create dev dataset
2022-08-08 16:47:27,086 INFO [asr_datamodule.py:368] (0/8) About to create dev dataloader
2022-08-08 16:47:27,087 INFO [asr_datamodule.py:214] (0/8) About to get Musan cuts
2022-08-08 16:47:27,199 INFO [asr_datamodule.py:368] (6/8) About to create dev dataloader
2022-08-08 16:47:27,200 INFO [asr_datamodule.py:214] (6/8) About to get Musan cuts
2022-08-08 16:47:27,213 INFO [asr_datamodule.py:368] (2/8) About to create dev dataloader
2022-08-08 16:47:27,214 INFO [asr_datamodule.py:214] (2/8) About to get Musan cuts
2022-08-08 16:47:27,218 INFO [asr_datamodule.py:368] (4/8) About to create dev dataloader
2022-08-08 16:47:27,220 INFO [asr_datamodule.py:214] (4/8) About to get Musan cuts
2022-08-08 16:47:27,282 INFO [asr_datamodule.py:368] (7/8) About to create dev dataloader
2022-08-08 16:47:27,283 INFO [asr_datamodule.py:214] (7/8) About to get Musan cuts
2022-08-08 16:47:27,324 INFO [asr_datamodule.py:368] (3/8) About to create dev dataloader
2022-08-08 16:47:27,325 INFO [asr_datamodule.py:214] (3/8) About to get Musan cuts
2022-08-08 16:47:27,440 INFO [train.py:1006] (5/8) Loading scheduler state dict
2022-08-08 16:47:27,440 INFO [asr_datamodule.py:415] (5/8) About to get train cuts
2022-08-08 16:47:27,443 INFO [asr_datamodule.py:424] (5/8) About to get dev cuts
2022-08-08 16:47:27,444 INFO [asr_datamodule.py:347] (5/8) About to create dev dataset
2022-08-08 16:47:27,501 INFO [train.py:1006] (1/8) Loading scheduler state dict
2022-08-08 16:47:27,501 INFO [asr_datamodule.py:415] (1/8) About to get train cuts
2022-08-08 16:47:27,505 INFO [asr_datamodule.py:424] (1/8) About to get dev cuts
2022-08-08 16:47:27,506 INFO [asr_datamodule.py:347] (1/8) About to create dev dataset
2022-08-08 16:47:28,157 INFO [asr_datamodule.py:368] (5/8) About to create dev dataloader
2022-08-08 16:47:28,158 INFO [asr_datamodule.py:214] (5/8) About to get Musan cuts
2022-08-08 16:47:28,218 INFO [asr_datamodule.py:368] (1/8) About to create dev dataloader
2022-08-08 16:47:28,219 INFO [asr_datamodule.py:214] (1/8) About to get Musan cuts
2022-08-08 16:47:29,614 INFO [asr_datamodule.py:221] (0/8) Enable MUSAN
2022-08-08 16:47:29,614 INFO [asr_datamodule.py:246] (0/8) Enable SpecAugment
2022-08-08 16:47:29,614 INFO [asr_datamodule.py:247] (0/8) Time warp factor: 80
2022-08-08 16:47:29,615 INFO [asr_datamodule.py:259] (0/8) Num frame mask: 10
2022-08-08 16:47:29,615 INFO [asr_datamodule.py:272] (0/8) About to create train dataset
2022-08-08 16:47:29,615 INFO [asr_datamodule.py:300] (0/8) Using DynamicBucketingSampler.
2022-08-08 16:47:29,736 INFO [asr_datamodule.py:221] (2/8) Enable MUSAN
2022-08-08 16:47:29,737 INFO [asr_datamodule.py:246] (2/8) Enable SpecAugment
2022-08-08 16:47:29,737 INFO [asr_datamodule.py:247] (2/8) Time warp factor: 80
2022-08-08 16:47:29,737 INFO [asr_datamodule.py:259] (2/8) Num frame mask: 10
2022-08-08 16:47:29,737 INFO [asr_datamodule.py:272] (2/8) About to create train dataset
2022-08-08 16:47:29,737 INFO [asr_datamodule.py:300] (2/8) Using DynamicBucketingSampler.
2022-08-08 16:47:29,738 INFO [asr_datamodule.py:221] (6/8) Enable MUSAN
2022-08-08 16:47:29,738 INFO [asr_datamodule.py:246] (6/8) Enable SpecAugment
2022-08-08 16:47:29,738 INFO [asr_datamodule.py:247] (6/8) Time warp factor: 80
2022-08-08 16:47:29,739 INFO [asr_datamodule.py:259] (6/8) Num frame mask: 10
2022-08-08 16:47:29,739 INFO [asr_datamodule.py:272] (6/8) About to create train dataset
2022-08-08 16:47:29,739 INFO [asr_datamodule.py:300] (6/8) Using DynamicBucketingSampler.
2022-08-08 16:47:29,792 INFO [asr_datamodule.py:221] (4/8) Enable MUSAN
2022-08-08 16:47:29,792 INFO [asr_datamodule.py:246] (4/8) Enable SpecAugment
2022-08-08 16:47:29,792 INFO [asr_datamodule.py:247] (4/8) Time warp factor: 80
2022-08-08 16:47:29,792 INFO [asr_datamodule.py:259] (4/8) Num frame mask: 10
2022-08-08 16:47:29,792 INFO [asr_datamodule.py:272] (4/8) About to create train dataset
2022-08-08 16:47:29,792 INFO [asr_datamodule.py:300] (4/8) Using DynamicBucketingSampler.
2022-08-08 16:47:29,854 INFO [asr_datamodule.py:221] (7/8) Enable MUSAN
2022-08-08 16:47:29,855 INFO [asr_datamodule.py:246] (7/8) Enable SpecAugment
2022-08-08 16:47:29,855 INFO [asr_datamodule.py:247] (7/8) Time warp factor: 80
2022-08-08 16:47:29,855 INFO [asr_datamodule.py:259] (7/8) Num frame mask: 10
2022-08-08 16:47:29,855 INFO [asr_datamodule.py:272] (7/8) About to create train dataset
2022-08-08 16:47:29,855 INFO [asr_datamodule.py:300] (7/8) Using DynamicBucketingSampler.
2022-08-08 16:47:29,930 INFO [asr_datamodule.py:221] (3/8) Enable MUSAN
2022-08-08 16:47:29,930 INFO [asr_datamodule.py:246] (3/8) Enable SpecAugment
2022-08-08 16:47:29,930 INFO [asr_datamodule.py:247] (3/8) Time warp factor: 80
2022-08-08 16:47:29,930 INFO [asr_datamodule.py:259] (3/8) Num frame mask: 10
2022-08-08 16:47:29,930 INFO [asr_datamodule.py:272] (3/8) About to create train dataset
2022-08-08 16:47:29,930 INFO [asr_datamodule.py:300] (3/8) Using DynamicBucketingSampler.
2022-08-08 16:47:30,668 INFO [asr_datamodule.py:221] (5/8) Enable MUSAN
2022-08-08 16:47:30,668 INFO [asr_datamodule.py:246] (5/8) Enable SpecAugment
2022-08-08 16:47:30,668 INFO [asr_datamodule.py:247] (5/8) Time warp factor: 80
2022-08-08 16:47:30,669 INFO [asr_datamodule.py:259] (5/8) Num frame mask: 10
2022-08-08 16:47:30,669 INFO [asr_datamodule.py:272] (5/8) About to create train dataset
2022-08-08 16:47:30,669 INFO [asr_datamodule.py:300] (5/8) Using DynamicBucketingSampler.
2022-08-08 16:47:31,057 INFO [asr_datamodule.py:221] (1/8) Enable MUSAN
2022-08-08 16:47:31,057 INFO [asr_datamodule.py:246] (1/8) Enable SpecAugment
2022-08-08 16:47:31,057 INFO [asr_datamodule.py:247] (1/8) Time warp factor: 80
2022-08-08 16:47:31,057 INFO [asr_datamodule.py:259] (1/8) Num frame mask: 10
2022-08-08 16:47:31,058 INFO [asr_datamodule.py:272] (1/8) About to create train dataset
2022-08-08 16:47:31,058 INFO [asr_datamodule.py:300] (1/8) Using DynamicBucketingSampler.
2022-08-08 16:47:33,049 INFO [asr_datamodule.py:316] (0/8) About to create train dataloader
2022-08-08 16:47:33,050 INFO [asr_datamodule.py:333] (0/8) Loading sampler state dict
2022-08-08 16:47:33,268 INFO [asr_datamodule.py:316] (2/8) About to create train dataloader
2022-08-08 16:47:33,269 INFO [asr_datamodule.py:333] (2/8) Loading sampler state dict
2022-08-08 16:47:33,270 INFO [asr_datamodule.py:316] (6/8) About to create train dataloader
2022-08-08 16:47:33,271 INFO [asr_datamodule.py:333] (6/8) Loading sampler state dict
2022-08-08 16:47:33,339 INFO [asr_datamodule.py:316] (4/8) About to create train dataloader
2022-08-08 16:47:33,340 INFO [asr_datamodule.py:333] (4/8) Loading sampler state dict
2022-08-08 16:47:33,390 INFO [asr_datamodule.py:316] (7/8) About to create train dataloader
2022-08-08 16:47:33,392 INFO [asr_datamodule.py:333] (7/8) Loading sampler state dict
2022-08-08 16:47:33,545 INFO [asr_datamodule.py:316] (3/8) About to create train dataloader
2022-08-08 16:47:33,547 INFO [asr_datamodule.py:333] (3/8) Loading sampler state dict
2022-08-08 16:47:34,213 INFO [asr_datamodule.py:316] (5/8) About to create train dataloader
2022-08-08 16:47:34,214 INFO [asr_datamodule.py:333] (5/8) Loading sampler state dict
2022-08-08 16:47:34,730 INFO [asr_datamodule.py:316] (1/8) About to create train dataloader
2022-08-08 16:47:34,731 INFO [asr_datamodule.py:333] (1/8) Loading sampler state dict
Traceback (most recent call last):
  File "/home/storage04/zhuangweiji/workspace/kaldi2/icefall/egs/wenetspeech/ASR/./pruned_transducer_stateless5/train.py", line 1204, in <module>
    main()
  File "/home/storage04/zhuangweiji/workspace/kaldi2/icefall/egs/wenetspeech/ASR/./pruned_transducer_stateless5/train.py", line 1195, in main
    mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True)
  File "/home/storage04/zhuangweiji/tools/anaconda3/envs/k2-py39-cuda10.2-torch1.12/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/storage04/zhuangweiji/tools/anaconda3/envs/k2-py39-cuda10.2-torch1.12/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/home/storage04/zhuangweiji/tools/anaconda3/envs/k2-py39-cuda10.2-torch1.12/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 2 terminated with the following error:
Traceback (most recent call last):
  File "/home/storage04/zhuangweiji/tools/anaconda3/envs/k2-py39-cuda10.2-torch1.12/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/home/storage04/zhuangweiji/workspace/kaldi2/icefall/egs/wenetspeech/ASR/pruned_transducer_stateless5/train.py", line 1042, in run
    train_dl = wenetspeech.train_dataloaders(
  File "/home/storage04/zhuangweiji/workspace/kaldi2/icefall/egs/wenetspeech/ASR/pruned_transducer_stateless5/asr_datamodule.py", line 334, in train_dataloaders
    train_dl.sampler.load_state_dict(sampler_state_dict)
  File "/home/storage04/zhuangweiji/workspace/kaldi2/lhotse/lhotse/dataset/sampling/dynamic_bucketing.py", line 174, in load_state_dict
    self._fast_forward()
  File "/home/storage04/zhuangweiji/workspace/kaldi2/lhotse/lhotse/dataset/sampling/dynamic_bucketing.py", line 190, in _fast_forward
    next(self)
  File "/home/storage04/zhuangweiji/workspace/kaldi2/lhotse/lhotse/dataset/sampling/base.py", line 261, in __next__
    batch = self._next_batch()
  File "/home/storage04/zhuangweiji/workspace/kaldi2/lhotse/lhotse/dataset/sampling/dynamic_bucketing.py", line 232, in _next_batch
    batch = next(self.cuts_iter)
StopIteration

lhotse-version’: '1.5.0.dev+git.08a613a.clean

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:16 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
pzelaskocommented, Oct 17, 2022

I think I finally realized what is the issue… please try again with this PR https://github.com/lhotse-speech/lhotse/pull/854

You will need to start a new training for the fix to kick in, as the existing checkpoints are already “corrupted” (unless you manually edit kept_batches/kept_cuts).

1reaction
danpoveycommented, Sep 30, 2022

Maybe he switched dataset? If that’s the case he might wnt to comment out the part where it loads the sampler state dict.

Read more comments on GitHub >

github_iconTop Results From Across the Web

RuntimeError: Error(s) in loading state_dict for Actor - torch ...
Traceback: I am facing a runtime error while loading the state_dict for actor model .I searched google but couldnt find similar issues ....
Read more >
loaded state dict has a different number of parameter groups ...
I am seeing the similar issue. I have the exact same model but got error ValueError: loaded state dict has a different number...
Read more >
How to fix 'Error(s) in loading state_dict for AWD_LSTM' when ...
When executing this code, I ran into the following error: RuntimeError: Error(s) in loading state_dict for AWD_LSTM: size mismatch for encoder.
Read more >
RuntimeError: Error(s) in loading state_dict for SimCLR
Based on the error message it seems you are trying to load a state_dict of a resnet-like model into your custom model.
Read more >
logging.config — Logging configuration — Python 3.11.1 ...
Takes the logging configuration from a dictionary. The contents of this dictionary are described in Configuration dictionary schema below. If an error is ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found