question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NotPSDError: Matrix not positive definite after repeatedly adding jitter up to 1.0e-06 when running on GPU

See original GitHub issue

I get the following error: NotPSDError: Matrix not positive definite after repeatedly adding jitter up to 1.0e-06 The below code works when running on CPU, but not when I switch to GPU. Why does it only work on CPU and not on GPU too?

I am training a DGP using pytorch lightning for regression which I have constructed like this. Input dimension to the first DGP layer is 256:

class PL_model(pl.LightningModule):
    def __init__(self,
                 batch_size,
                 lr,
                 betas,
                 num_samples,
                 num_output_dims
                ):
        super().__init__()
        # Training parameters
        self.batch_size = batch_size
        self.lr = lr
        self.betas = betas
        self.num_samples = num_samples
        self.num_output_dims = num_output_dims
        
        self.gpmodel = DeepGP(256, self.num_output_dims)
        self.mll = DeepApproximateMLL(VariationalELBO(self.gpmodel.likelihood, self.gpmodel, self.batch_size))
    
    def forward(self, x):
        # compute prediction
        return self.gpmodel(x)
    
    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.lr, betas=self.betas, weight_decay=1e-3)
    
    def training_step(self, batch, batch_idx):
        x, y = batch
        with gpytorch.settings.num_likelihood_samples(self.num_samples):
            output = self(x)
            loss = -self.mll(output, y)
        return loss
    
    def validation_step(self, batch, batch_idx):
        x, y = batch
        with torch.no_grad():
            lls = self.gpmodel.likelihood.log_marginal(y, self(x))
        return -lls
    
    def setup(self, stage=None):
        dataset = load_dataset()
        train_split = int(0.8 * len(dataset))
        val_split = len(dataset) - train_split
        self.train_set, self.val_set = random_split(dataset, [train_split, val_split])
        
    def train_dataloader(self):
        return DataLoader(self.train_set, batch_size=self.batch_size, shuffle=True)
    
    def val_dataloader(self):
        return DataLoader(self.val_set, batch_size=self.batch_size, shuffle=False)
    
model = PL_model(
    batch_size=32,
    lr=0.1,
    betas=(0.85,0.89),
    num_samples=3,
    num_output_dims=2
)

trainer = pl.Trainer(
    min_epochs=5,
    max_epochs=8,
    gpus=1,
    logger=TensorBoardLogger("lightning_logs/", name="DGP")
)
trainer.fit(model)

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
gpleisscommented, Dec 9, 2022

@mbelalsh it happens - it’s a known stability issue with Gaussian processes. It is a property of your data, and the fact that all computations are done in single precision.

Try switching to double precision, or using smaller learning rates on your optimizer.

0reactions
mbelalshcommented, Dec 8, 2022

@gpleiss I got the error while using the VNNGP at the prediction time. My data is already normalized.

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found