Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

torch type mismatch error

See original GitHub issue

With python3.6, pytorch0.4.1, cuda9.0, I got the following error when I run train.py with timit example:

$ python train.py examples/timit/seq2seq_config.json
Traceback (most recent call last):
  File "train.py", line 146, in <module>
    run(config)
  File "train.py", line 104, in run
    run_state = run_epoch(model, optimizer, train_ldr, *run_state)
  File "train.py", line 29, in run_epoch
    loss = model.loss(batch)
  File "/path/to/speech/models/seq2seq.py", line 57, in loss
    out, alis = self.forward_impl(x, y)
  File "/path/to/speech/models/seq2seq.py", line 68, in forward_impl
    out, alis = self.decode(x, y)
  File "/path/to/speech/models/seq2seq.py", line 103, in decode
    hx = self.dec_rnn(ix.squeeze(dim=1), hx)
  File "/path/to/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/path/to/lib64/python3.6/site-packages/torch/nn/modules/rnn.py", line 794, in forward
    self.bias_ih, self.bias_hh,
  File "/path/to/lib64/python3.6/site-packages/torch/nn/_functions/rnn.py", line 53, in GRUCell
    gh = F.linear(hidden, w_hh)
  File "/path/to/lib64/python3.6/site-packages/torch/nn/functional.py", line 1026, in linear
    output = input.matmul(weight.t())
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'mat2'

If I add torch.set_default_tensor_type('torch.cuda.FloatTensor') in main function, error becomes:

Traceback (most recent call last):
  File "train.py", line 148, in <module>
    run(config)
  File "train.py", line 110, in run
    dev_loss, dev_cer = eval_dev(model, dev_ldr, preproc)
  File "train.py", line 57, in eval_dev
    preds = model.infer(batch)
  File "/path/to/speech/models/seq2seq.py", line 176, in infer
    _, argmaxs = self.infer_decode(x, y, end_tok, max_len)
  File "/path/to/speech/models/seq2seq.py", line 155, in infer_decode
    if torch.sum(y.data == end_tok) == y.numel():
RuntimeError: Expected object of type torch.cuda.LongTensor but found type torch.LongTensor for argument #2 'other'

Do you have idea to solve this?

Issue Analytics

State:
Created 5 years ago
Comments:9

Top GitHub Comments

2reactions

pkyoungcommented, Sep 28, 2018

I managed to execute train.py, but not yet confirmed training was successful or not. The quick (and dirty) remedy to above error was:

diff --git a/speech/models/seq2seq.py b/speech/models/seq2seq.py
index b2881e3..65e3a38 100644
--- a/speech/models/seq2seq.py
+++ b/speech/models/seq2seq.py
@@ -87,7 +87,7 @@ class Seq2Seq(model.Model):

         hx = torch.zeros((x.shape[0], x.shape[2]), requires_grad=False)
         if self.is_cuda:
-            hx.cuda()
+            hx = hx.cuda()
         ax = None; sx = None;
         for t in range(y.size()[1] - 1):
             sample = (out and self.scheduled_sampling)
@@ -119,7 +119,7 @@ class Seq2Seq(model.Model):
         if state is None:
             hx = torch.zeros((x.shape[0], x.shape[2]), requires_grad=False)
             if self.is_cuda:
-                hx.cuda()
+                hx = hx.cuda()
             ax = None; sx = None;
         else:
             hx, ax, sx = state
@@ -164,7 +164,7 @@ class Seq2Seq(model.Model):
         Infer a likely output. No beam search yet.
         """
         x, y = self.collate(*batch)
-        end_tok = y.data[0, -1] # TODO
+        end_tok = y.data[0, -1].cuda() # TODO
         t = y
         if self.is_cuda:
             x = x.cuda()
@@ -172,7 +172,7 @@ class Seq2Seq(model.Model):
         x = self.encode(x)

         # needs to be the start token, TODO
-        y = t[:, 0:1]
+        y = t[:, 0:1].cuda()
         _, argmaxs = self.infer_decode(x, y, end_tok, max_len)
         argmaxs = argmaxs.cpu().data.numpy()
         return [seq.tolist() for seq in argmaxs]

And there was also error in train.py

diff --git a/train.py b/train.py
index a04eb6c..6141ba0 100644
--- a/train.py
+++ b/train.py
@@ -10,6 +10,7 @@ import torch
 import torch.nn as nn
 import torch.optim
 import tqdm
+import copy

 import speech
 import speech.loader as loader
@@ -30,7 +31,7 @@ def run_epoch(model, optimizer, train_ldr, it, avg_loss):
         loss.backward()

         grad_norm = nn.utils.clip_grad_norm(model.parameters(), 200)
-        loss = loss.data[0]
+        loss = loss.item()

         optimizer.step()
         prev_end_t = end_t
@@ -54,11 +55,13 @@ def eval_dev(model, ldr, preproc):
     model.set_eval()

     for batch in tqdm.tqdm(ldr):
-        preds = model.infer(batch)
-        loss = model.loss(batch)
-        losses.append(loss.data[0])
+        batch_ = copy.deepcopy(batch)
+        preds = model.infer(batch_)
+        batch_ = copy.deepcopy(batch)
+        loss = model.loss(batch_)
+        losses.append(loss.item())
         all_preds.extend(preds)
-        all_labels.extend(batch[1])
+        all_labels.extend(list(batch)[1])

     model.set_train()

0reactions

NAM-hjcommented, May 20, 2020

After these, I got these warnings while training.

WARNING: Forward backward likelihood mismatch 0.000084
WARNING: Forward backward likelihood mismatch 0.000092
WARNING: Forward backward likelihood mismatch 0.000046