Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GPU out of memory with Reformer enwik8 model

See original GitHub issue

❓ Questions & Help

I’m trying to run the pretrained model google/reformer-enwik8 but I’m getting CUDA out of memory errors unless I limit the sequences to one-fourth of the model capacity (~16k instead of the 65k).

This happens with a Titan Xp with 12GB RAM; I expected all the tricks of the Reformer to make the model with the original sequence size fit.

The code I’m running:

model = ReformerModelWithLMHead.from_pretrained('google/reformer-enwik8')
model.cuda()

config = model.config
max_len = config.max_position_embeddings
dataset = Enwik8Dataset(
    path, max_len, pad_id=config.pad_token_id,
    eos_id=config.eos_token_id)
loader = DataLoader(dataset, batch_size=1, shuffle=False)

acc_loss = 0
for batch in loader:
    with torch.no_grad():
        batch_loss = model(input_ids=batch, labels=batch)[0]
    acc_loss += batch_loss.mean().item()

acc_loss /= len(dataset)

The Enwik8Dataset inherits from Dataset and does the basic data preprocessing, I can post the code if necessary.

A link to original question on Stack Overflow: https://stackoverflow.com/questions/62373033/gpu-out-of-memory-with-enwik8-reformer-from-huggingface

Issue Analytics

State:
Created 3 years ago
Comments:9 (8 by maintainers)

Top GitHub Comments

3reactions

erickrfcommented, Jun 23, 2020

Ok, so I found that the main culprit was that the Trainer was storing all model predictions in GPU memory during evaluation at https://github.com/huggingface/transformers/blob/c01480bba3b2f0bd8516679476235f4701c21b3b/src/transformers/trainer.py#L775

Passing prediction_loss_only=False avoided that. By the way, I believe this should be the default value in the Trainer, and that the cat operation could use cpu tensors, in case the validation dataset is big.

0reactions

stale[bot]commented, Aug 23, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.