GPU out of memory with Reformer enwik8 model
See original GitHub issue❓ Questions & Help
I’m trying to run the pretrained model google/reformer-enwik8
but I’m getting CUDA out of memory errors unless I limit the sequences to one-fourth of the model capacity (~16k instead of the 65k).
This happens with a Titan Xp with 12GB RAM; I expected all the tricks of the Reformer to make the model with the original sequence size fit.
The code I’m running:
model = ReformerModelWithLMHead.from_pretrained('google/reformer-enwik8')
model.cuda()
config = model.config
max_len = config.max_position_embeddings
dataset = Enwik8Dataset(
path, max_len, pad_id=config.pad_token_id,
eos_id=config.eos_token_id)
loader = DataLoader(dataset, batch_size=1, shuffle=False)
acc_loss = 0
for batch in loader:
with torch.no_grad():
batch_loss = model(input_ids=batch, labels=batch)[0]
acc_loss += batch_loss.mean().item()
acc_loss /= len(dataset)
The Enwik8Dataset inherits from Dataset and does the basic data preprocessing, I can post the code if necessary.
A link to original question on Stack Overflow: https://stackoverflow.com/questions/62373033/gpu-out-of-memory-with-enwik8-reformer-from-huggingface
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (8 by maintainers)
Top Results From Across the Web
The Reformer - Pushing the limits of language modeling
How the Reformer uses less than 8GB of RAM to train on sequences of half a million tokens. The Reformer model as introduced...
Read more >Reformer Reproducibility - Long Draft – Weights & Biases
The Reformer attempts to achieve this by reducing the memory footprint of the Transformer and thus enabling modelling of deeper models and longer...
Read more >Google's Reformer Works On A Single GPU & Is Memory ...
This deep machine learning model is used for various natural language processing (NLP) tasks such as language understanding, machine translation ...
Read more >trax-ml/community - Gitter
1) Did you compare the performance of Reformer against local attention with context ... once but many times the GPU-allocator runs out of...
Read more >Applying and Adapting the Reformer as a Computationally ...
the baseline model BiDAF, we found significant flaws with the Reformer, most ... tation with only 16 GB of memory and 1 GPU...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Ok, so I found that the main culprit was that the
Trainer
was storing all model predictions in GPU memory during evaluation at https://github.com/huggingface/transformers/blob/c01480bba3b2f0bd8516679476235f4701c21b3b/src/transformers/trainer.py#L775Passing
prediction_loss_only=False
avoided that. By the way, I believe this should be the default value in theTrainer
, and that thecat
operation could use cpu tensors, in case the validation dataset is big.This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.