question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Command to train Persona-Chat baseline seq2seq model

See original GitHub issue

After looking at the personachat directory. I was wondering what command to use to train the seq2seq model using ParlAI. It looks like there’s a different seq2seq model being used. And some colleagues mentioned they tried to train with the default parlai seq2seq using the options from the paper and ran into out of memory errors until they reduced the batch size.

The question is, what command would you recommend using to replicate the baseline eg: python examples/train_model.py -t babi:task10k:1 -m seq2seq -mf /tmp/model_s2s -bs 32 -vtim 30 -vcut 0.95

Apologies that this sounds like such a lazy question and if there’s an answer already that I missed but hopefully this will be of interest to other people as well.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
Henry-Ecommented, Jun 11, 2018

Thanks so much for the response. That’s super helpful. I’ll try running more models with different options and see what I can figure out.

The reason I mentioned using a separate encoder vs. token marked encoding was that was the main difference you noted between the personachat specific seq2seq and the main parlai seq2seq. As you said though, there’s more features in the main parlai seq2seq so it’s probably not a proper comparison.

1reaction
alexholdenmillercommented, Jun 11, 2018

Let me run through those:

  1. oom on validation: that’s because training was hovering just below your limit and the validation may have had a slightly longer example than any in the training set which pushed it past the max. so this is normal, and may just need smaller batch sizes or some truncation of the inputs. 2a) that seems too high for bottoming out–my runs very consistently gets a lot lower than that (~31-34) is approx two hours when I trained with bsz 128. how long did that take? you may be training slower than on my GPU (esp with the lower bsz), so may need to e.g. double the validation time so it doesn’t run out of patience as quickly, or may need to adjust the learning rate with that different bsz. 2b) however, note that this ppl you’re seeing is based on the model’s cross entropy loss. this means that it includes things like predicting the __END__ token. the leaderboard is based on the separate eval_ppl script which does a much more careful job of evaluating and doesn’t include these extra special characters from the model, so that the different models can be compared exactly. this ppl tends to be worse (e.g. adding a few points to the valid ppl, I mean of course predicting END is easy!).
  2. the seq2seq model in the personachat paper vs the seq2seq model you ran is quite different, and we trained them a little bit differently as well. most notably, I believe they only trained the persona_seq2seq model on one side of the conversation (e.g if A and B are talking, only train to predict B’s responses from A, not A’s from B); the convai2 task includes the conversations from both perspectives, effectively increasing the size of the training set. again though, the models are doing different things, and the seq2seq model you trained has a bunch of extra bells and whistles which can be helpful. for example, even just doing “post” attention vs the “pre” attention the persona-seq2seq model does I found to drop the ppl by a few points.
  3. would love to see results on separated vs token-marked encoding of the persona!
Read more comments on GitHub >

github_iconTop Results From Across the Web

Seq2seq (Sequence to Sequence) Model with PyTorch - Guru99
The training process in Seq2seq models is starts with converting each pair of sentences into Tensors from their Lang index. Our sequence to ......
Read more >
Convai2 - ParlAI
You can run examples of training on this task in the baselines folder in this directory. For example, you can download and interact...
Read more >
parlai - PyPI
Evaluate an IR baseline model on the validation set of the Personachat task: parlai eval_model -m ir_baseline -t personachat -dt valid. Train a...
Read more >
Deep Reinforcement Learning for Sequence-to ... - arXiv
In seq2seq models, the input is a sequence of certain data units ... of combining seq2seq training with RL training and to guide...
Read more >
NLP From Scratch: Translation with a Sequence to ... - PyTorch
[KEY: > input, = target, < output] > il est en train de peindre un tableau . ... With a seq2seq model the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found