Seq2Seq on Reddit movie task throws RuntimeError midway through training
See original GitHub issueHi, when running the following command with python 3.6.5 and pytorch 0.3.1
python examples/train_model.py -t "#moviedd-reddit" -dt train:stream -bs 1 -clen 3 -m seq2seq -mf /tmp/s2s -ltim 30 -vtim 30 -stim 30 -vcut 0.95 --dict-maxtokens 50000
The model starts to train but after a few printouts it fails with the following error stack
File "examples/train_model.py", line 275, in <module>
TrainLoop(setup_args()).train()
File "examples/train_model.py", line 252, in train
stop_training = self.validate()
File "examples/train_model.py", line 176, in validate
valid_world=self.valid_world)
File "examples/train_model.py", line 106, in run_eval
valid_world.parley()
File "/home/atalreja/code/ParlAI/parlai/core/worlds.py", line 286, in parley
acts[1] = agents[1].act()
File "/home/atalreja/code/ParlAI/parlai/agents/seq2seq/seq2seq.py", line 570, in act
return self.batch_act([self.observation])[0]
File "/home/atalreja/code/ParlAI/parlai/agents/seq2seq/seq2seq.py", line 546, in batch_act
predictions, text_cand_inds = self.predict(xs, ys, cands, valid_cands, is_training)
File "/home/atalreja/code/ParlAI/parlai/agents/seq2seq/seq2seq.py", line 459, in predict
_, scores, _ = self.model(xs, ys)
File "/home/atalreja/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/atalreja/code/ParlAI/parlai/agents/seq2seq/modules.py", line 73, in forward
y_in = ys.narrow(1, 0, ys.size(1) - 1)
RuntimeError: invalid argument 5: out of range at /pytorch/torch/lib/THC/generic/THCTensor.c:468
I think it might have to do with the context length setting because I was trying different values for that. Sorry for the lack of details, I’ll continue trying different options
Thanks!
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
[D] BERT for seq2seq tasks : r/MachineLearning - Reddit
Using BERT as an encoder, is you use a naive method with a decoder learned from scratch, you would still have to perfrom...
Read more >Official Discussion - Nope [SPOILERS] : r/movies - Reddit
The residents of a lonely gulch in inland California bear witness to an uncanny and chilling discovery. Director: Jordan Peele. Writers: Jordan ...
Read more >Official Discussion - X [SPOILERS] : r/movies - Reddit
Summary: In 1979, a group of young filmmakers set out to make an adult film in rural Texas, but when their reclusive, elderly...
Read more >Official Discussion - Fall [SPOILERS] : r/movies - Reddit
I want a sequel where Becky tracks down the methheads that stole her truck and left them stranded. Somehow she captures them and...
Read more >VideoEssay: A subreddit for analytic videos and supercuts
r/videoessay: A hub for video essays, super cuts, and other videos critically observing media texts.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
fixed, please pull and try again!
Hey sorry missed this notification, so the candidate ranking code is very memory-inefficient currently, and since the candidate set for this dataset is quite large this model is not going to be able to handle it. You’ll have to run on CPU (very slowly), choose a different model for ranking, or submit a PR for a memory-friendly implementation.
Since this model doesn’t do ranking during training, even a well-trained version does not seem to do very well on ranking anyways, so I wouldn’t recommend using this model for that.