Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Command to train Persona-Chat baseline seq2seq model

See original GitHub issue

After looking at the personachat directory. I was wondering what command to use to train the seq2seq model using ParlAI. It looks like there’s a different seq2seq model being used. And some colleagues mentioned they tried to train with the default parlai seq2seq using the options from the paper and ran into out of memory errors until they reduced the batch size.

The question is, what command would you recommend using to replicate the baseline eg: python examples/train_model.py -t babi:task10k:1 -m seq2seq -mf /tmp/model_s2s -bs 32 -vtim 30 -vcut 0.95

Apologies that this sounds like such a lazy question and if there’s an answer already that I missed but hopefully this will be of interest to other people as well.

Issue Analytics

State:
Created 5 years ago
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

Henry-Ecommented, Jun 11, 2018

Thanks so much for the response. That’s super helpful. I’ll try running more models with different options and see what I can figure out.

The reason I mentioned using a separate encoder vs. token marked encoding was that was the main difference you noted between the personachat specific seq2seq and the main parlai seq2seq. As you said though, there’s more features in the main parlai seq2seq so it’s probably not a proper comparison.

1reaction

alexholdenmillercommented, Jun 11, 2018

Let me run through those:

oom on validation: that’s because training was hovering just below your limit and the validation may have had a slightly longer example than any in the training set which pushed it past the max. so this is normal, and may just need smaller batch sizes or some truncation of the inputs. 2a) that seems too high for bottoming out–my runs very consistently gets a lot lower than that (~31-34) is approx two hours when I trained with bsz 128. how long did that take? you may be training slower than on my GPU (esp with the lower bsz), so may need to e.g. double the validation time so it doesn’t run out of patience as quickly, or may need to adjust the learning rate with that different bsz. 2b) however, note that this ppl you’re seeing is based on the model’s cross entropy loss. this means that it includes things like predicting the __END__ token. the leaderboard is based on the separate eval_ppl script which does a much more careful job of evaluating and doesn’t include these extra special characters from the model, so that the different models can be compared exactly. this ppl tends to be worse (e.g. adding a few points to the valid ppl, I mean of course predicting END is easy!).
the seq2seq model in the personachat paper vs the seq2seq model you ran is quite different, and we trained them a little bit differently as well. most notably, I believe they only trained the persona_seq2seq model on one side of the conversation (e.g if A and B are talking, only train to predict B’s responses from A, not A’s from B); the convai2 task includes the conversations from both perspectives, effectively increasing the size of the training set. again though, the models are doing different things, and the seq2seq model you trained has a bunch of extra bells and whistles which can be helpful. for example, even just doing “post” attention vs the “pre” attention the persona-seq2seq model does I found to drop the ppl by a few points.
would love to see results on separated vs token-marked encoding of the persona!