question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BertLayerNorm not loaded in CPU mode

See original GitHub issue

I am running into an exception when loading a model on CPU in one of the example scripts. I suppose this is related to loading the FusedLayerNorm from apex, even when --no_cuda has been set. https://github.com/huggingface/pytorch-pretrained-BERT/blob/8da280ebbeca5ebd7561fd05af78c65df9161f92/pytorch_pretrained_bert/modeling.py#L154

Or is this working for anybody else?

Example:

run_classifier.py --data_dir glue/CoLA --task_name CoLA --do_train --do_eval --bert_model bert-base-cased --max_seq_length 32 --train_batch_size 12 --learning_rate 2e-5 --num_train_epochs 2.0 --output_dir /tmp/mrpc_output/ --no_cuda

Exception:

[...]
File "/home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 19, in forward
    input_, self.normalized_shape, weight_, bias_, self.eps)
RuntimeError: input must be a CUDA tensor (layer_norm_affine at apex/normalization/csrc/layer_norm_cuda.cpp:120)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fe35f6e4cc5 in /home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: layer_norm_affine(at::Tensor, c10::ArrayRef<long>, at::Tensor, at::Tensor, double) + 0x4bc (0x7fe3591456ac in /home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x18db4 (0x7fe359152db4 in /home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x16505 (0x7fe359150505 in /home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
<omitting python frames>
frame #12: THPFunction_do_forward(THPFunction*, _object*) + 0x15c (0x7fe38fb7db7c in /home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:16 (7 by maintainers)

github_iconTop GitHub Comments

3reactions
thomwolfcommented, Jan 10, 2019

I see. It’s a bit tricky because apex is loaded by default when it can be found and this loading is deep inside the library it-self, not the examples (here). I don’t think it’s worth it to add specific logic inside the loading of the library to handle such a case.

I guess the easiest solution in your case is to have two python environments (with conda or virtualenv) and switch to the one in which apex is not installed when don’t want to use GPU.

Feel free to re-open the issue if this doesn’t solve your problem.

1reaction
LamDangcommented, Apr 11, 2019

Hello, I pushed a pull request here to solve this issue upstream https://github.com/NVIDIA/apex/pull/256

Update: it is merged into apex

Read more comments on GitHub >

github_iconTop Results From Across the Web

Source code for transformers.modeling_bert
Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the :meth:`~transformers.PreTrainedModel.
Read more >
pytorch model saved from TPU run on CPU
But I would like to run it localy on cpu (for inference, of course) and got this error when pytorch tried to load...
Read more >
Python Examples of torch.load
This page shows Python examples of torch.load.
Read more >
bert_experiment
'/device:GPU:0': raise SystemError('GPU device not found') print('Found GPU at: ... Load BertForSequenceClassification, the pretrained BERT model with a ...
Read more >
nlp_architect.models.transformers.quantized_bert
... pylint: disable=bad-super-call """ Quantized BERT layers and model ... Get state dict of model state_dict = torch.load(model_file, map_location="cpu") ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found