Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CPU based pre-trained model

See original GitHub issue

I am guessing that the model provided is for machines with CUDA-capable device. Do you guys happen to have a pre-trained CPU version for cnndm_model.bin ?

@@ -165,7 +165,7 @@ def main():
     print(args.model_recover_path)
     for model_recover_path in glob.glob(args.model_recover_path.strip()):
         logger.info("***** Recover model: %s *****", model_recover_path)
-        model_recover = torch.load(model_recover_path)
+        model_recover = torch.load(model_recover_path, map_location="cpu")

DATA_DIR=../cnndm_data
MODEL_RECOVER_PATH=../cnndm_model.bin
EVAL_SPLIT=test
export PYTORCH_PRETRAINED_BERT_CACHE=/tmp/bert-cased-pretrained-cache
# run decoding
python biunilm/decode_seq2seq.py --fp16 --amp --bert_model bert-large-cased --new_segment_ids --mode s2s --need_score_t
races \
  --input_file ${DATA_DIR}/${EVAL_SPLIT}.src --split ${EVAL_SPLIT} --tokenized_input \
  --model_recover_path ${MODEL_RECOVER_PATH} \
  --max_seq_length 768 --max_tgt_length 128 \
  --batch_size 64 --beam_size 5 --length_penalty 0 \
  --forbid_duplicate_ngrams --forbid_ignore_word ".|[X_SEP]"
11/04/2019 15:55:06 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/
models.huggingface.co/bert/bert-large-cased-vocab.txt from cache at /tmp/bert-cased-pretrained-cache/cee054f6aafe5e2cf8
16d2228704e326446785f940f5451a5b26033516a4ac3d.e13dbb970cb325137104fb2e5f36fe865f27746c6b526f6352861b1980eb80b1
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=51 error=38 : no CUDA-capable device is detected
Traceback (most recent call last):
  File "biunilm/decode_seq2seq.py", line 254, in <module>
    main()
  File "biunilm/decode_seq2seq.py", line 147, in main
    amp_handle = amp.init(enable_caching=True)
  File "/home/john/.virtualenvs/unilm/lib/python3.6/site-packages/apex/amp/amp.py", line 65, in init
    handle = AmpHandle(enable_caching, verbose)
  File "/home/john/.virtualenvs/unilm/lib/python3.6/site-packages/apex/amp/handle.py", line 14, in __init__
    self._default_scaler = LossScaler()
  File "/home/john/.virtualenvs/unilm/lib/python3.6/site-packages/apex/amp/scaler.py", line 35, in __init__
    self._overflow_buf = torch.cuda.IntTensor([0])
  File "/home/john/.virtualenvs/unilm/lib/python3.6/site-packages/torch/cuda/__init__.py", line 163, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:51
[1]    72305 exit 1     python biunilm/decode_seq2seq.py --fp16 --amp --bert_model bert-large-cased

without --amp:

Traceback (most recent call last):
  File "biunilm/decode_seq2seq.py", line 254, in <module>
    main()
  File "biunilm/decode_seq2seq.py", line 216, in main
    position_ids, input_mask, task_idx=task_idx, mask_qkv=mask_qkv)
  File "/home/john/.virtualenvs/unilm/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/john/code/unilm/src/pytorch_pretrained_bert/modeling.py", line 1409, in forward
    return self.beam_search(input_ids, token_type_ids, position_ids, attention_mask, task_idx=task_idx, mask_qkv=mask_qkv)
  File "/home/john/code/unilm/src/pytorch_pretrained_bert/modeling.py", line 1528, in beam_search
    output_all_encoded_layers=True, prev_embedding=prev_embedding, prev_encoded_layers=prev_encoded_layers, mask_qkv=mask_qkv)
  File "/home/john/.virtualenvs/unilm/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/john/code/unilm/src/pytorch_pretrained_bert/modeling.py", line 1062, in forward
    input_ids, token_type_ids, attention_mask)
  File "/home/john/code/unilm/src/pytorch_pretrained_bert/modeling.py", line 1037, in get_extended_attention_mask
    extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0
  File "/home/john/.virtualenvs/unilm/lib/python3.6/site-packages/torch/tensor.py", line 371, in __rsub__
    return _C._VariableFunctions.rsub(self, other)
RuntimeError: "add_cpu" not implemented for 'Half'

Packages:

pytorch-pretrained-bert 0.4.0    
torch                   1.1.0 
tensorboardX            1.9
apex                     0.1

Issue Analytics

State:
Created 4 years ago
Comments:11 (2 by maintainers)

Top GitHub Comments

7reactions

johnyoonhcommented, Nov 10, 2019

Thanks aretius

The following changes made it work. Lowering batch_size and beam_size is also crucial.

DATA_DIR=./cnndm_data                                                                                                                                                         
MODEL_RECOVER_PATH=./cnndm_model.bin
EVAL_SPLIT=test
PYTORCH_PRETRAINED_BERT_CACHE=~/tmp/bert-cased-pretrained-cache
python src/biunilm/decode_seq2seq.py --bert_model bert-large-cased --new_segment_ids --mode s2s --need_score_traces \
  --input_file ${DATA_DIR}/${EVAL_SPLIT}.src --split ${EVAL_SPLIT} --tokenized_input \
  --model_recover_path ${MODEL_RECOVER_PATH} \
  --max_seq_length 768 --max_tgt_length 128 \
  --batch_size 1 --beam_size 2 --length_penalty 0 \
  --forbid_duplicate_ngrams --forbid_ignore_word ".|[X_SEP]"

--- a/src/pytorch_pretrained_bert/__init__.py
+++ b/src/pytorch_pretrained_bert/__init__.py
@@ -3,5 +3,5 @@ from .tokenization import BertTokenizer, BasicTokenizer, WordpieceTokenizer
 from .modeling import (BertConfig, BertModel, BertForPreTraining, BertForMaskedLM, BertForNextSentencePrediction, BertForSequenceClassification,
                        BertForMultipleChoice, BertForTokenClassification, BertForQuestionAnswering, BertForPreTrainingLossMask, BertPreTrainingPairRel, BertPreTrainingPairTransform)
 from .optimization import BertAdam, BertAdamFineTune
-from .optimization_fp16 import FP16_Optimizer_State
+# from .optimization_fp16 import FP16_Optimizer_State
 from .file_utils import PYTORCH_PRETRAINED_BERT_CACHE
diff --git a/src/pytorch_pretrained_bert/modeling.py b/src/pytorch_pretrained_bert/modeling.py

6reactions

aretiuscommented, Nov 5, 2019

Hey @johnyoonh To run in CPU mode you should run decode_seq2seq.py without --amp and --fp16. Also don’t install apex. Where ever the code is importing apex just comment that part out. Apex is just supposed to make things faster not necessary for inference. Also stick to the versions of libraries as suggested in the README.md I have spent a whole day to figure out the CPU inference for UniLM. If you have more doubts do tell me.

Top Results From Across the Web

Train models on CPU — VISSL 0.1.6 documentation

VISSL supports training any model on CPUs. Typically, this involves correctly setting the MACHINE.DEVICE=cpu and adjusting the distributed settings ...

Handling big models - Hugging Face

In step 2, we load another full version of the model in RAM, with the pretrained weights. If you're loading a model with...

Using pretrained LSTM and Bert Models in CPU Only ...

I loaded the models and in a computer where only CPU is available. They both work fine but the model.predict(text) function is super...

GPUs vs CPUs for deployment of deep learning models

It can be concluded that for deep learning inference tasks which use models with high number of parameters, GPU based deployments benefit from ......

Do I need gpu while working with pretrained model?

Usually, when using a trained model you just need to do a sparse prediction per time unit. In such situation CPU approach should...