Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: Expected all tensors to be on the same device

See original GitHub issue

Code:

model = SetFitModel.from_pretrained(
      "sentence-transformers/paraphrase-mpnet-base-v2",
       use_differentiable_head=True,
       head_params={"out_features": num_classes},
    )

trainer = SetFitTrainer(
       model=model,
       train_dataset=ds["train"],
       eval_dataset=ds["test"],
       loss_class=CosineSimilarityLoss,
       metric="accuracy",
       batch_size=64,
       num_iterations=10, # The number of text pairs to generate for contrastive learning
       num_epochs=3, # The number of epochs to use for constrastive learning
       learning_rate=1.25e-05,
       column_mapping={"text": "text", "label": "label"} # Map dataset columns to text/label expected by trainer
 )

trainer.freeze()

trainer.unfreeze(keep_body_frozen=True)

trainer.train(
       num_epochs=25, # The number of epochs to train the head or the whole model (body and head)
       batch_size=64,
       body_learning_rate=1.25e-5, # The body's learning rate
       learning_rate=1e-3, # The head's learning rate
       l2_weight=0.0, # Weight decay on **both** the body and head. If `None`, will use 0.01.
   )

Error:

File ~/github.com/julep-ai/monorepo/lab/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, **kwargs)
   1186 # If we don't have any hooks, we want to skip the rest of the logic in
   1187 # this function, and just call forward.
   1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190     return forward_call(*input, **kwargs)
   1191 # Do not call functions when jit is used
   1192 full_backward_hooks, non_full_backward_hooks = [], []

File ~/github.com/julep-ai/monorepo/lab/.venv/lib/python3.8/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_addmm)

What fixed it:

model.model_body = model.model_body.to("cuda:0")
model.model_head = model.model_head.to("cuda:0")

Issue Analytics

State:
Created 10 months ago
Reactions:1
Comments:6 (3 by maintainers)

Top GitHub Comments

3reactions

blakechicommented, Nov 12, 2022

Okay I think I know where the bug is.

So sentence-transformers by default initializes the model on CPU and it puts the model on GPU if available when we call fit. Ref 1 Ref 2.

So that’s why it works smoothly when we first train the model body then train the head. If we train the head directly, since the body haven’t been put to the GPU yet, so the error raised.

Hope the explanation makes sense! 😃

I can fix it by putting the body to the _target_device right after its initialization. Will open a PR soon. Thank you @creatorrr for finding this bug!

1reaction

tomaarsencommented, Dec 13, 2022

When running (a slightly modified variant of) your script, it now works as intended since blake’s changes, so I’ll close this!