Cuda Memory overflow during predictions
See original GitHub issueI am running a simple multitask GP on an NVIDIA P6000 GPU (with mem = 24gb). Following is my code
GP Model:
class MultitaskGPModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood):
super(MultitaskGPModel, self).__init__(train_x, train_y, likelihood)
self.mean_module = gpytorch.means.MultitaskMean(
gpytorch.means.ConstantMean(), num_tasks=128
)
self.covar_module = gpytorch.kernels.MultitaskKernel(
gpytorch.kernels.RBFKernel(), num_tasks=128, rank=1
)
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultitaskMultivariateNormal(mean_x, covar_x)
Model Import:
from gp_models import MultitaskGPModel
inputs,targets = fine_tune_data() #inputs and target size [7200,128]
gp_model = MultitaskGPModel(inputs,targets,likelihood).cuda()
# gp_model = torch.nn.DataParallel(gp_model)
gp_model.set_train_data(inputs,targets,strict = False)
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, gp_model)
gp_optimizer = opt.optimizer([
{'params': gp_model.parameters()}, # Includes GaussianLikelihood parameters
], lr=0.02)
Training:
def fine_tune_train():
gp_model.train()
likelihood.train()
n_iter = 50
for i in range(n_iter):
gp_optimizer.zero_grad()
output = gp_model(inputs)
loss = -mll(output, targets)
loss.backward(retain_graph=True)
print('Iter %d/%d - Loss: %.3f' % (i + 1, n_iter, loss.item()))
gp_optimizer.step()
However when prediction is made using the gp_model
prediction = likelihood(gp_model(test)) #test_size = [80,128]
Memory of GPU overflows. Can someone help me to identify the memory leak?
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
Resolving CUDA Being Out of Memory With Gradient ...
Implementing gradient accumulation and automatic mixed precision to solve CUDA out of memory issue when training big deep learning models ...
Read more >Memory overflow issue in CUDA when trying to make a ...
Memory overflow issue in CUDA when trying to make a prediction for a trained model - nlp - PyTorch Forums.
Read more >CUDA out of memory - nlp - Stack Overflow
I am getting error when trying to run BERT model for NER task. "CUDA out of memory. Tried to allocate 20.00 MiB (GPU...
Read more >Cuda out of memory during evaluation but training is fine
Hi, I am finetuning a BARTForConditionalGeneration model. I am using Trainer from the library to train so I do not use anything fancy....
Read more >Solving "CUDA out of memory" Error - Kaggle
I can see the VRAM usage bouncing around now so this is definitely working to some extent. Batch_size of 8 still causes an...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
You have 128 tasks - with the 7200 train points and the 80 test points, your full joint covariance matrix is
n x n
withn = 128 * (7200 + 80) ~ 1M
. This is your memory leak.I have the same question. I run a simple multitask GP on an NVIDIA 3090ti GPU (with mem = 24gb). The shape of input data is (3500,42) and the shape of output data is (3500,3). Use the above data to train the model, then make the prediction by the trained model and the memory will be consumed quickly until OOM. The memory during training.
The memory during making prediction. Then the memory will maintain the state of 13.2GB. I tried to use
torch.cuda_empty()
to release the memory, but it didn’t work.torch.cuda_empty()
can only release half of the memory.Following my code: GP model:
Training:
Prediction:
The version is:
gpytorch:1.6.0;
python:3.9.7;
pytorch:1.10.0