Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Slow evaluation using Trainer with TPUs in Colab

See original GitHub issue

Environment info

transformers version: 4.3.3
Platform: Linux-4.19.112±x86_64-with-Ubuntu-18.04-bionic
Python version: 3.7.10
PyTorch version (GPU?): 1.9.0a0+7a178a8 (False)
Tensorflow version (GPU?): 2.4.1 (False)
Using GPU in script?: TPU
Using distributed or parallel set-up in script?: NO

@sgugger @patrickvonplaten

Model I am using (Bert, XLNet …): BERT

I’m having very slow eval times using the Trainer API in conjunction with XLA in Google Colab. While the training epochs are running at a good speed, evaluating after each epoch it takes a very long time. I’ve tried restricting dataset size and tokenization max length with no success.

I’m not sure how to check whether it’s using XLA during evaluation.

The task I am working on is NLI, using multi-nli from datasets

To reproduce

Execute this notebook

https://colab.research.google.com/drive/1dVEfoxGvMAKd0GLnrUJSHZycGtyKt9mr?usp=sharing

Expected behavior

Evaluation speed should be approximately the same as training.

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

finiteautomatacommented, Apr 19, 2021

Ok, I followed this notebook (T5 on TPU) and I managed to solve that error by using start_method="fork" on xmp.spawn. Thanks for your help @sgugger!

def train_nli(index):
   # All the training code here
   ...
   
xmp.spawn(train_nli, args=(), start_method="spawn")

The notebook with the full code is here

1reaction

sguggercommented, Feb 26, 2021

I don’t know of any easier way than launching the training function (in PyTorch). If you come across an easy example, please let me know and we will try to make the Trainer as easy to use.