Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error in making prediction on CPU after training the model on GPU

See original GitHub issue

Hi, I trained the model on GPU according to tutorial.

reader = BertQA(bert_model='bert-base-multilingual-cased',
                train_batch_size=256,
                learning_rate=3e-5,
                num_train_epochs=2,
                do_lower_case=False,
                verbose_logging=True,
                output_dir='./temp')

reader.fit(X=(train_examples, train_features))

And before dumping the model, send it to CPU.

reader.model.to('cpu')
reader.device = torch.device('cpu')

But I try to make a prediction on CPU, then following error occurs…

query = 'some sample query...'
prediction = cdqa_pipeline.predict(X=query)

--------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-79-c881b3585457> in <module>
      1 query = ''some sample query...''
----> 2 prediction = cdqa_pipeline.predict(X=query)

~/anaconda3/lib/python3.7/site-packages/cdqa/pipeline/cdqa_sklearn.py in predict(self, X, return_logit)
    158                                                      metadata=self.metadata)
    159             examples, features = self.processor_predict.fit_transform(X=squad_examples)
--> 160             prediction = self.reader.predict((examples, features), return_logit)
    161             return prediction
    162 

~/anaconda3/lib/python3.7/site-packages/cdqa/reader/bertqa_sklearn.py in predict(self, X, return_logit)
   1220             with torch.no_grad():
   1221                 batch_start_logits, batch_end_logits = self.model(
-> 1222                     input_ids, segment_ids, input_mask)
   1223             for i, example_index in enumerate(example_indices):
   1224                 start_logits = batch_start_logits[i].detach().cpu().tolist()

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    491             result = self._slow_forward(*input, **kwargs)
    492         else:
--> 493             result = self.forward(*input, **kwargs)
    494         for hook in self._forward_hooks.values():
    495             hook_result = hook(self, input, result)

~/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py in forward(self, *inputs, **kwargs)
    144                 raise RuntimeError("module must have its parameters and buffers "
    145                                    "on device {} (device_ids[0]) but found one of "
--> 146                                    "them on device: {}".format(self.src_device_obj, t.device))
    147 
    148         inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)

RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

Is there something else I need to do?

Issue Analytics

State:
Created 4 years ago
Comments:6

Top GitHub Comments

1reaction

deo4kyocommented, Sep 24, 2019

Hi, andrelmfarias. I used 8 GPUs(distributed training).

But now I understand what’s wrong. Thanks for your help ndrelmfarias.

0reactions

andrelmfariascommented, Sep 20, 2019

I just tried to train a new model and when print type(model.model) I get

pytorch_pretrained_bert.modeling.BertForQuestionAnswering

Not torch.nn.parallel.data_parallel.DataParallel…

Did you train the model with multiple GPUs with distributed training?

Thanks

Top Results From Across the Web

Using CPU after training in GPU - Data Science Stack Exchange

I can train a network with 560x560 pix images and batch-size=1, but after training is over when I try to test/predict I get...

Machine Learning Models for GPU Error Prediction in a Large ...

The basic block is a node, that consists of one AMD Opteron 6274 CPU and one NVIDIA. K20X GPU. Four nodes make up...

Error on prediction running keras multi_gpu_model

From the tf.keras.utils.multi_gpu_model we can see that it works in the following way: Divide the model's input(s) into multiple sub-batches ...

Explain Your Machine Learning Model Predictions with GPU ...

This post explains how you can train an XGBoost model, implement the SHAP technique in Python using a CPU and GPU, and finally...

Building a Performance Model for Deep Learning ... - arXiv

For predicting GPU training time of DL models, we show our critical-path-based E2E performance ... error will be 60% by following the same...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Error in making prediction on CPU after training the model on GPU

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Predictions for certain paragraphs are inaccurate. What's wrong?

Difference in predictions between GPU and CPU model