Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Clarification questions

See original GitHub issue

Hi, I have a few questions regarding TransCoder’s training data and optimization setting.

From the paper, it is clear that TransCoder is trained using Standalone functions during the DAE+BT training stage. But is TransCoder only trained using Standalone functions in the MLM stage too?
During the MLM stage, only the encoder part of TransCoder is pre-trained, right?
For the MLM pre-training, max_epoch and epoch_size are set to 100k. If I understand correctly, epoch_size basically refers to the number of instances used in each epoch. Is it correct? Also, for MLM pre-training, the following are set:

--validation_metrics _valid_mlm_ppl \
--stopping_criterion '_valid_mlm_ppl,10'

So, I am assuming TransCoder pre-training is stopped based on the stopping_criterion. Before, the MLM pre-training was stopped, how many optimization steps were executed?

Unlike the MLM pre-training stage, for the DAE+BT stage training, there is no stopping_criterion is set. And the epoch_size was set to 50000 and the max_epoch was set to 10000000. So, when the training stops? How many optimization steps were executed during this stage?

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

brozicommented, Sep 23, 2021

It’s per GPU so yes the actual batch size is 32 * 32

0reactions

brozicommented, Sep 24, 2021

The epoch size is supposed to be the number of samples you train on. However there is something that’s a bit confusing in our code. We always use increase the sentences counter by the batch_size parameter when training but it’s not always the actual batch size.

for stream datasets (e.g. for MLM) we use the batch parameter so epoch_size is really the number of samples (per GPU so you need to multiply by 32)
for DAE and BT we fix the number of tokens per batch instead of the epoch size to avoid OOMs due to batches of varying sizes so the batch size is not constant. In that case epoch_size is num_updates/32 (32 is the default batch size parameter), which doesn’t really cause any issues but is quite confusing.

I’ll think about changing the behaviour of this parameter to have something more coherent in a way that minimizes the changes people will need to make to the parameter.

Top Results From Across the Web

What Are Clarifying Questions and When Should You Ask ...

Clarifying questions are questions that the listener asks the speaker in an attempt to eliminate or prevent any misunderstanding, confusion or ...

HANDOUT: CLARIFYING AND PROBING QUESTIONS

Clarifying Questions are simple questions of fact. They clarify the dilemma and provide the nuts and bolts so that the participants can ask...

Clarifying questions — what they are & why you should know ...

When I first learned about clarifying questions — questions designed to get clarity about an issue, idea, or perspective before making a decision...

How to Ask Clarifying Questions - Video & Lesson Transcript

Clarifying questions are tools used by active listeners to ensure understanding and obtain essential information. These types of questions are ...

Clarification - Communication Skills | SkillsYouNeed

Clarifying can involve asking questions or occasionally summarising what the speaker has said. A listener can ask for clarification when they cannot make ......