Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BertLayerNorm not loaded in CPU mode

See original GitHub issue

I am running into an exception when loading a model on CPU in one of the example scripts. I suppose this is related to loading the FusedLayerNorm from apex, even when --no_cuda has been set. https://github.com/huggingface/pytorch-pretrained-BERT/blob/8da280ebbeca5ebd7561fd05af78c65df9161f92/pytorch_pretrained_bert/modeling.py#L154

Or is this working for anybody else?

Example:

run_classifier.py --data_dir glue/CoLA --task_name CoLA --do_train --do_eval --bert_model bert-base-cased --max_seq_length 32 --train_batch_size 12 --learning_rate 2e-5 --num_train_epochs 2.0 --output_dir /tmp/mrpc_output/ --no_cuda

Exception:

[...]
File "/home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 19, in forward
    input_, self.normalized_shape, weight_, bias_, self.eps)
RuntimeError: input must be a CUDA tensor (layer_norm_affine at apex/normalization/csrc/layer_norm_cuda.cpp:120)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fe35f6e4cc5 in /home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: layer_norm_affine(at::Tensor, c10::ArrayRef<long>, at::Tensor, at::Tensor, double) + 0x4bc (0x7fe3591456ac in /home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x18db4 (0x7fe359152db4 in /home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x16505 (0x7fe359150505 in /home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
<omitting python frames>
frame #12: THPFunction_do_forward(THPFunction*, _object*) + 0x15c (0x7fe38fb7db7c in /home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

Issue Analytics

State:
Created 5 years ago
Comments:16 (7 by maintainers)

Top GitHub Comments

3reactions

thomwolfcommented, Jan 10, 2019

I see. It’s a bit tricky because apex is loaded by default when it can be found and this loading is deep inside the library it-self, not the examples (here). I don’t think it’s worth it to add specific logic inside the loading of the library to handle such a case.

I guess the easiest solution in your case is to have two python environments (with conda or virtualenv) and switch to the one in which apex is not installed when don’t want to use GPU.

Feel free to re-open the issue if this doesn’t solve your problem.

1reaction

LamDangcommented, Apr 11, 2019

Hello, I pushed a pull request here to solve this issue upstream https://github.com/NVIDIA/apex/pull/256

Update: it is merged into apex

Top Results From Across the Web

Source code for transformers.modeling_bert

Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the :meth:`~transformers.PreTrainedModel.

pytorch model saved from TPU run on CPU

But I would like to run it localy on cpu (for inference, of course) and got this error when pytorch tried to load...

Python Examples of torch.load

This page shows Python examples of torch.load.

bert_experiment

'/device:GPU:0': raise SystemError('GPU device not found') print('Found GPU at: ... Load BertForSequenceClassification, the pretrained BERT model with a ...

nlp_architect.models.transformers.quantized_bert

... pylint: disable=bad-super-call """ Quantized BERT layers and model ... Get state dict of model state_dict = torch.load(model_file, map_location="cpu") ...