RuntimeError: Expected all tensors to be on the same device
See original GitHub issueCode:
model = SetFitModel.from_pretrained(
"sentence-transformers/paraphrase-mpnet-base-v2",
use_differentiable_head=True,
head_params={"out_features": num_classes},
)
trainer = SetFitTrainer(
model=model,
train_dataset=ds["train"],
eval_dataset=ds["test"],
loss_class=CosineSimilarityLoss,
metric="accuracy",
batch_size=64,
num_iterations=10, # The number of text pairs to generate for contrastive learning
num_epochs=3, # The number of epochs to use for constrastive learning
learning_rate=1.25e-05,
column_mapping={"text": "text", "label": "label"} # Map dataset columns to text/label expected by trainer
)
trainer.freeze()
trainer.unfreeze(keep_body_frozen=True)
trainer.train(
num_epochs=25, # The number of epochs to train the head or the whole model (body and head)
batch_size=64,
body_learning_rate=1.25e-5, # The body's learning rate
learning_rate=1e-3, # The head's learning rate
l2_weight=0.0, # Weight decay on **both** the body and head. If `None`, will use 0.01.
)
Error:
File ~/github.com/julep-ai/monorepo/lab/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, **kwargs)
1186 # If we don't have any hooks, we want to skip the rest of the logic in
1187 # this function, and just call forward.
1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1189 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190 return forward_call(*input, **kwargs)
1191 # Do not call functions when jit is used
1192 full_backward_hooks, non_full_backward_hooks = [], []
File ~/github.com/julep-ai/monorepo/lab/.venv/lib/python3.8/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_addmm)
What fixed it:
model.model_body = model.model_body.to("cuda:0")
model.model_head = model.model_head.to("cuda:0")
Issue Analytics
- State:
- Created 10 months ago
- Reactions:1
- Comments:6 (3 by maintainers)
Top Results From Across the Web
RuntimeError: Expected all tensors to be on the same device ...
Seems like the issue comes from criterion(pred, target) . Can you check pred.is_cuda and target. · 3. It looks like you are calling...
Read more >Expected all tensors to be on the same ... - PyTorch Forums
One way to check what device your tensors are in is : tensor.device, you can check to see if all tensors are in...
Read more >RuntimeError: Expected all tensors to be ... - Deep Graph Library
Hi! I am encountering problems when trying to send my graph to device for prediction. I do the following: device = torch.device("cuda:0" if ......
Read more >Expected all tensors to be on the same device - Beginners
RuntimeError : Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument...
Read more >Expected all tensors to be on the same device, but found at ...
It seems as if the the model is on the GPU but the data is on the CPU. It appears as if the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Okay I think I know where the bug is.
So
sentence-transformers
by default initializes the model on CPU and it puts the model on GPU if available when we callfit
. Ref 1 Ref 2.So that’s why it works smoothly when we first train the model body then train the head. If we train the head directly, since the body haven’t been put to the GPU yet, so the error raised.
Hope the explanation makes sense! 😃
I can fix it by putting the body to the
_target_device
right after its initialization. Will open a PR soon. Thank you @creatorrr for finding this bug!When running (a slightly modified variant of) your script, it now works as intended since blake’s changes, so I’ll close this!