Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

🐛 Trainer on TPU : KeyError 'getstate'

See original GitHub issue

🐛 Bug

Information

Model I am using : ELECTRA base

Language I am using the model on : English

The problem arises when using:

the official example scripts
my own modified scripts

The tasks I am working on is:

an official GLUE/SQUaD task
my own task or dataset

To reproduce

I’m trying to fine-tune a model on Colab TPU, using the new Trainer API. But I’m struggling.

Here is a self-contained Colab notebook to reproduce the error (it’s a dummy example).

When running the notebook, I get the following error :

File "/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py", line 199, in __getattr__
    return self.data[item]
KeyError: '__getstate__'

Full stack trace :

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/parallel_loader.py", line 172, in _worker
    batch = xm.send_cpu_data_to_device(batch, device)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/core/xla_model.py", line 624, in send_cpu_data_to_device
    return ToXlaTensorArena(convert_fn, select_fn).transform(data)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/core/xla_model.py", line 307, in transform
    return self._replace_tensors(inputs)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/core/xla_model.py", line 301, in _replace_tensors
    convert_fn)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/utils/utils.py", line 199, in for_each_instance_rewrite
    return _for_each_instance_rewrite(value, select_fn, fn, rwmap)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/utils/utils.py", line 179, in _for_each_instance_rewrite
    result.append(_for_each_instance_rewrite(x, select_fn, fn, rwmap))
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/utils/utils.py", line 187, in _for_each_instance_rewrite
    result = copy.copy(value)
  File "/usr/lib/python3.6/copy.py", line 96, in copy
    rv = reductor(4)
  File "/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py", line 199, in __getattr__
    return self.data[item]
KeyError: '__getstate__'

Any hint on how to make this dummy example work is welcomed.

Environment info

transformers version: 2.9.0
Platform: Linux-4.19.104±x86_64-with-Ubuntu-18.04-bionic
Python version: 3.6.9
PyTorch version (GPU?): 1.6.0a0+cf82011 (False)
Tensorflow version (GPU?): 2.2.0 (False)
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

@jysohn23 @julien-c

Issue Analytics

State:
Created 3 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

julien-ccommented, May 14, 2020

Cc @mfuntowicz

0reactions

LysandreJikcommented, Jul 28, 2020

Indeed, this should have been fixed in the versions v3+. Thanks for opening an issue @Colanim.

Top Results From Across the Web

Google Colab KeyError: 'COLAB_TPU_ADDR' - Stack Overflow

I'm trying to run a simple MNIST classifier on Google Colab using the TPU option. After creating the model using Keras, I am...

BERT Fine-Tuning Tutorial with PyTorch - Chris McCormick

1. Setup. 1.1. Using Colab GPU for Training. Google Colab offers free GPUs and TPUs! Since we'll be training a large neural network ......

ray Changelog - pyup.io

- **Ray now works on Google Colab again!** The bug with memory limit fetching when running Ray in a container is now fixed...

See raw diff - Hugging Face

18075, "sType": 18076, "Ġae": 18077, "Ġdevelopment": 18078, "getstate": ... "Ġchance": 18469, "Resp": 18470, "changelog": 18471, "trainer": 18472, "Ġ'.

MIT was we will home can us about if page my has no

... print course job canada process teen room stock training too credit point ... himself component enable exercise bug santa mid guarantee leader...