Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Seq2Seq on Reddit movie task throws RuntimeError midway through training

See original GitHub issue

Hi, when running the following command with python 3.6.5 and pytorch 0.3.1

python examples/train_model.py -t "#moviedd-reddit" -dt train:stream -bs 1 -clen 3 -m seq2seq -mf /tmp/s2s -ltim 30 -vtim 30 -stim 30 -vcut 0.95 --dict-maxtokens 50000

The model starts to train but after a few printouts it fails with the following error stack

  File "examples/train_model.py", line 275, in <module>
    TrainLoop(setup_args()).train()
  File "examples/train_model.py", line 252, in train
    stop_training = self.validate()
  File "examples/train_model.py", line 176, in validate
    valid_world=self.valid_world)
  File "examples/train_model.py", line 106, in run_eval
    valid_world.parley()
  File "/home/atalreja/code/ParlAI/parlai/core/worlds.py", line 286, in parley
    acts[1] = agents[1].act()
  File "/home/atalreja/code/ParlAI/parlai/agents/seq2seq/seq2seq.py", line 570, in act
    return self.batch_act([self.observation])[0]
  File "/home/atalreja/code/ParlAI/parlai/agents/seq2seq/seq2seq.py", line 546, in batch_act
    predictions, text_cand_inds = self.predict(xs, ys, cands, valid_cands, is_training)
  File "/home/atalreja/code/ParlAI/parlai/agents/seq2seq/seq2seq.py", line 459, in predict
    _, scores, _ = self.model(xs, ys)
  File "/home/atalreja/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/atalreja/code/ParlAI/parlai/agents/seq2seq/modules.py", line 73, in forward
    y_in = ys.narrow(1, 0, ys.size(1) - 1)
RuntimeError: invalid argument 5: out of range at /pytorch/torch/lib/THC/generic/THCTensor.c:468

I think it might have to do with the context length setting because I was trying different values for that. Sorry for the lack of details, I’ll continue trying different options

Thanks!

Issue Analytics

State:
Created 5 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

alexholdenmillercommented, Apr 25, 2018

fixed, please pull and try again!

0reactions

alexholdenmillercommented, May 10, 2018

Hey sorry missed this notification, so the candidate ranking code is very memory-inefficient currently, and since the candidate set for this dataset is quite large this model is not going to be able to handle it. You’ll have to run on CPU (very slowly), choose a different model for ranking, or submit a PR for a memory-friendly implementation.

Since this model doesn’t do ranking during training, even a well-trained version does not seem to do very well on ranking anyways, so I wouldn’t recommend using this model for that.