question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Invalid Covariance Matrix Error in DeepDPM_alternations.py

See original GitHub issue

Hello,

Thank you for the amazing work and for publishing the code.

I encountered the following error when running the DeepDPM_alternations.py script on MNIST.

The command I used is the following (copied from the example only with --dir different and --offline added):

python DeepDPM_alternations.py --latent_dim 10 --dataset mnist --lambda_ 0.005 --lr 0.002 --init_k 3 --train_cluster_net 200 --alternate --init_cluster_net_using_centers --reinit_net_at_alternation --dir ./dataset/ --pretrain_path ./saved_models/ae_weights/mnist_e2e.zip --number_of_ae_alternations 3 --transform None --log_metrics_at_train True --gpus 1 --epoch 1 --offline

The error I got is the following. I tried several times, and the error always happened around iteration 30 - 33.

Evaluating...
Epoch 30: 100%|█████████| 548/548 [00:17<00:00, 30.63it/s, loss=0.00362, v_num=]
Epoch 31:  86%|███████▋ | 469/548 [00:14<00:02, 31.33it/s, loss=0.00345, v_num=]Evaluating...
NMI : 0.49259220412522065, ARI: 0.28473904520954585, ACC: 0.30912, current K: 3
Validating: 0it [00:00, ?it/s]
Epoch 31:  86%|███████▋ | 470/548 [00:15<00:02, 30.05it/s, loss=0.00345, v_num=]
Epoch 0:   0%|                                          | 0/548 [07:41<?, ?it/s]
Traceback (most recent call last):
  File "/home/shichang/paper_reproduce/DeepDPM/DeepDPM_alternations.py", line 202, in <module>
    trainer.fit(model, train_loader, val_loader)
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit
    self.dispatch()
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 546, in dispatch
    self.accelerator.start_training(self)
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 114, in start_training
    self._results = trainer.run_train()
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 633, in run_train
    self.train_loop.on_train_epoch_start(epoch)
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 203, in on_train_epoch_start
    self.trainer.call_hook("on_train_epoch_start")
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1102, in call_hook
    output = hook_fx(*args, **kwargs)
  File "/home/shichang/paper_reproduce/DeepDPM/src/AE_ClusterPipeline.py", line 250, in on_train_epoch_start
    self._init_clusters()
  File "/home/shichang/paper_reproduce/DeepDPM/src/AE_ClusterPipeline.py", line 121, in _init_clusters
    self.clustering.init_cluster(self.train_dataloader(), self.val_dataloader(), logger=self.logger, centers=centers, init_num=self.init_clusternet_num)
  File "/home/shichang/paper_reproduce/DeepDPM/src/clustering_models/clusternet.py", line 57, in init_cluster
    self.fit_cluster(train_loader, val_loader, logger, centers)
  File "/home/shichang/paper_reproduce/DeepDPM/src/clustering_models/clusternet.py", line 65, in fit_cluster
    self.model.fit(train_loader, val_loader, logger, self.args.train_cluster_net, centers=centers)
  File "/home/shichang/paper_reproduce/DeepDPM/src/clustering_models/clusternet_modules/clusternet_trainer.py", line 42, in fit
    cluster_trainer.fit(self.cluster_model, train_loader, val_loader)
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit
    self.dispatch()
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 546, in dispatch
    self.accelerator.start_training(self)
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 114, in start_training
    self._results = trainer.run_train()
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 637, in run_train
    self.train_loop.run_training_epoch()
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 577, in run_training_epoch
    self.trainer.run_evaluation(on_epoch=True)
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 726, in run_evaluation
    output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 166, in evaluation_step
    output = self.trainer.accelerator.validation_step(args)
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 177, in validation_step
    return self.training_type_plugin.validation_step(*args)
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 131, in validation_step
    return self.lightning_module.validation_step(*args, **kwargs)
  File "/home/shichang/paper_reproduce/DeepDPM/src/clustering_models/clusternet_modules/clusternetasmodel.py", line 275, in validation_step
    cluster_loss = self.training_utils.cluster_loss_function(
  File "/home/shichang/paper_reproduce/DeepDPM/src/clustering_models/clusternet_modules/utils/training_utils.py", line 235, in cluster_loss_function
    gmm_k = MultivariateNormal(model_mus[k].double().to(device=self.device), model_covs[k].double().to(device=self.device))
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/torch/distributions/multivariate_normal.py", line 146, in __init__
    super(MultivariateNormal, self).__init__(batch_shape, event_shape, validate_args=validate_args)
  File "/home/shichang/anaconda3/envs/deepdpm/lib/python3.9/site-packages/torch/distributions/distribution.py", line 55, in __init__
    raise ValueError(
ValueError: Expected parameter covariance_matrix (Tensor of shape (10, 10)) of distribution MultivariateNormal(loc: torch.Size([10]), covariance_matrix: torch.Size([10, 10])) to satisfy the constraint PositiveDefinite(), but found invalid values:
tensor([[ 0.5951,  0.0092,  0.2854,  0.0728, -0.2996,  0.1251, -0.1172, -0.0078,
          0.0765,  0.1235],
        [ 0.0092,  0.5513,  0.0643, -0.1219,  0.1693,  0.1630, -0.1415, -0.0218,
         -0.1041,  0.0764],
        [ 0.2854,  0.0643,  0.6676, -0.2186, -0.2203,  0.0510, -0.0280, -0.1126,
          0.1910,  0.1136],
        [ 0.0728, -0.1219, -0.2186,  0.8729, -0.1988, -0.1151, -0.0771,  0.1936,
         -0.1680,  0.1279],
        [-0.2996,  0.1693, -0.2203, -0.1988,  0.7979, -0.0535,  0.0689, -0.0186,
         -0.1713, -0.0783],
        [ 0.1251,  0.1630,  0.0510, -0.1151, -0.0535,  0.4169, -0.1412,  0.0098,
          0.0774,  0.1063],
        [-0.1172, -0.1415, -0.0280, -0.0771,  0.0689, -0.1412,  0.5972, -0.0558,
          0.0094, -0.0770],
        [-0.0078, -0.0218, -0.1126,  0.1936, -0.0186,  0.0098, -0.0558,  0.4313,
          0.0730,  0.0873],
        [ 0.0765, -0.1041,  0.1910, -0.1680, -0.1713,  0.0774,  0.0094,  0.0730,
          0.6265,  0.0646],
        [ 0.1235,  0.0764,  0.1136,  0.1279, -0.0783,  0.1063, -0.0770,  0.0873,
          0.0646,  0.4388]], device='cuda:1', dtype=torch.float64)

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
meitarronencommented, Aug 4, 2022

@shwetakatti98 @blakechi @LifelongReID @ShichangZh Please take a look at the last update of the code, the issues should be fixed 😃

0reactions
npikkicommented, Dec 14, 2022

Hi! I have just run into exactly the same issue, using parameters specified in README and dependencies from the requirements file (expect for numpy 1.21 instead of 1.20: otherwise it seems incompatible with pandas version 1.4), could you please reopen the issue? Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Co-variance matrix is undertermined · Issue #262 - GitHub
Hi all, I am writing to you since the following error appears when using: est = LinearDMLCateEstimator(model_y = LassoCV(cv=[(fold00, ...
Read more >
numpy.polyfit giving error for covariance matrix - Stack Overflow
The error is saying precisely what the problem is: you do not have enough data point to estimate the covariance matrix (and not...
Read more >
Trouble creating Type = COV dataset - SAS Communities
ERROR: Invalid covariance or conditional covariance matrix; matrix is not positive definite. NOTE: The SAS System stopped processing this step ...
Read more >
Kalman filter update returns an invalid covariance matrix?
I think your computations are wrong. When I compute the Kalman gain matrix Kk I get: Kk=(0.99503110.4993789).
Read more >
Correlation, Variance and Covariance (Matrices) - R
cov2cor scales a covariance matrix into the corresponding correlation matrix ... then the presence of missing observations will produce an error.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found