expected behavior for `keep_last_ckpts = -1`
See original GitHub issueHi @juliakreutzer,
I wanted to save ckpts at every validation step regardless of early-stopping-metric score, so I set keep_last_ckpts = -1
, according to the description here:
But joeynmt didn’t save ckpts at all in that case. Actually, the TrainManager doesn’t call _save_checkpoint()
func if keep_last_ckpts
is less than or equal to zero (queue with infinite length: https://docs.python.org/3/library/queue.html).
https://github.com/joeynmt/joeynmt/blob/46b2fe3b05638728413ee5bae6a347411175c3c5/joeynmt/training.py#L98-L99
https://github.com/joeynmt/joeynmt/blob/46b2fe3b05638728413ee5bae6a347411175c3c5/joeynmt/training.py#L544-L547
What is the expected behavior? You indeed intended no save action if keep_last_ckpts = -1
, that is, the description in config was wrong or can we change the code so that ckpts will be saved every time if keep_last_ckpts = -1
?
Issue Analytics
- State:
- Created 3 years ago
- Comments:9
Top GitHub Comments
Yes you’re right, it’s not we’ll defined when both queues are overlapping. I like your simplification idea: save the most recent one plus any additional best checkpoints. I think this fits the practical use cases in the best way without creating confusion👍
yeah, it sounds good. we could have multiple queues, but then a bit complicated to handle mis-specifications such as keep_last_ckpts=-1 and keep_best_ckpts=10, no? Although we can return a configuration error and abort the process, I feel a bit too harsh, especially when I see such a config error after I waited for huge data loaded…
if the usage for the latest ckpt is almost limited to resume the interrupted training, how about saving the latest ckpt always by default without any option? and we always use “best” criterion to determine the number of ckpts to save/delete? Of course, we change the “best” logic so that really best n models will be saved. For instance, in the following case, best 3 means [4000, 6000, 2000], instead of the ckpts with
*
. Even though the step 6000 doesn’t beat the best bleu sofar, but still better than the worst one in the queue, so we update our queue. Then it’s more likely that relatively newer ckpts will be kept.maybe it’s rather my personal preference. Misspecification can happen whatever we define for this option. Two separate queues for
best
andlast
also sounds reasonable to me.