question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: Expected all tensors to be on the same device

See original GitHub issue

Code:

model = SetFitModel.from_pretrained(
      "sentence-transformers/paraphrase-mpnet-base-v2",
       use_differentiable_head=True,
       head_params={"out_features": num_classes},
    )

trainer = SetFitTrainer(
       model=model,
       train_dataset=ds["train"],
       eval_dataset=ds["test"],
       loss_class=CosineSimilarityLoss,
       metric="accuracy",
       batch_size=64,
       num_iterations=10, # The number of text pairs to generate for contrastive learning
       num_epochs=3, # The number of epochs to use for constrastive learning
       learning_rate=1.25e-05,
       column_mapping={"text": "text", "label": "label"} # Map dataset columns to text/label expected by trainer
 )

trainer.freeze()

trainer.unfreeze(keep_body_frozen=True)

trainer.train(
       num_epochs=25, # The number of epochs to train the head or the whole model (body and head)
       batch_size=64,
       body_learning_rate=1.25e-5, # The body's learning rate
       learning_rate=1e-3, # The head's learning rate
       l2_weight=0.0, # Weight decay on **both** the body and head. If `None`, will use 0.01.
   )

Error:

File ~/github.com/julep-ai/monorepo/lab/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, **kwargs)
   1186 # If we don't have any hooks, we want to skip the rest of the logic in
   1187 # this function, and just call forward.
   1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190     return forward_call(*input, **kwargs)
   1191 # Do not call functions when jit is used
   1192 full_backward_hooks, non_full_backward_hooks = [], []

File ~/github.com/julep-ai/monorepo/lab/.venv/lib/python3.8/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_addmm)

What fixed it:

model.model_body = model.model_body.to("cuda:0")
model.model_head = model.model_head.to("cuda:0")

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Reactions:1
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
blakechicommented, Nov 12, 2022

Okay I think I know where the bug is.

So sentence-transformers by default initializes the model on CPU and it puts the model on GPU if available when we call fit. Ref 1 Ref 2.

So that’s why it works smoothly when we first train the model body then train the head. If we train the head directly, since the body haven’t been put to the GPU yet, so the error raised.

Hope the explanation makes sense! 😃

I can fix it by putting the body to the _target_device right after its initialization. Will open a PR soon. Thank you @creatorrr for finding this bug!

1reaction
tomaarsencommented, Dec 13, 2022

When running (a slightly modified variant of) your script, it now works as intended since blake’s changes, so I’ll close this!

Read more comments on GitHub >

github_iconTop Results From Across the Web

RuntimeError: Expected all tensors to be on the same device ...
Seems like the issue comes from criterion(pred, target) . Can you check pred.is_cuda and target. · 3. It looks like you are calling...
Read more >
Expected all tensors to be on the same ... - PyTorch Forums
One way to check what device your tensors are in is : tensor.device, you can check to see if all tensors are in...
Read more >
RuntimeError: Expected all tensors to be ... - Deep Graph Library
Hi! I am encountering problems when trying to send my graph to device for prediction. I do the following: device = torch.device("cuda:0" if ......
Read more >
Expected all tensors to be on the same device - Beginners
RuntimeError : Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument...
Read more >
Expected all tensors to be on the same device, but found at ...
It seems as if the the model is on the GPU but the data is on the CPU. It appears as if the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found