question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconsistencies with scoring and inference

See original GitHub issue

There are a few inconsistencies related to inference that have come up in evaluating scoring (#538). However, these are corner cases that could arise in other settings, as well. The problem is:

  • Length normalization is only applied to completed hypotheses on the beam. However, if the beam hits the maximum sequence length before generating </s>, it will still return, with the unnormalized score.
  • This is a general problem for scoring: when retrieving a score from the Sockeye inference CLI (via --output-type translation_with_score), there is no way to know whether </s> was generated and therefore whether length normalization was applied (since </s> is stripped before returning to the user).
  • Sockeye’s scores are therefore underspecified.

This is a problem for evaluating scoring. Scoring takes raw text and will therefore always append , just as is done in training. I am running into this problem because sometimes the outputs haven’t actually finished but have just hit the maximum output length and are unnormalized. This could be a problem more generally.

I am not sure of the correct solution, but I propose this:

  • In inference, --maximum-output-length should refer to the hypothesis excluding </s>. The reasoning is that this flag is a user-facing feature, and users do not see the </s> since it is stripped off.
  • Beam search stops at the max output length. If there are unfinished hypotheses, however, the decoder should take one more step and force each the selection of </s> for all unfinished hypotheses. It will then apply length normalization, too.
  • Length normalization should be computed including </s>, since it is generated by the decoder.

This way, the user can be guaranteed that every hypothesis actually finished, and that all obtained scores are comparable.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
tdomhancommented, Sep 26, 2018

yeah, that’s a good point. So I guess we could always take the probability of </s> as the last token for truncated hypothesis. This is effectively anyway what we do: we force the model to stop.

0reactions
fhiebercommented, Aug 29, 2019

Sockeye 2 addresses this issue, see #719.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Errors in Statistical Inference Under Model Misspecification
The error rates in evidential analysis all decrease to 0 as sample size increases even under model misspecification. Neyman-Pearson testing on ...
Read more >
Methods for correcting inference based on outcomes ... - PNAS
We propose a statistical adjustment to correct biased inference in regression models using predicted outcomes—regardless of the machine-learning ...
Read more >
Inference under discrepancy - Richard Wilkinson
inference under discrepancy? Particularly when the discrepancy model is crude? Consistency? ▻ I don't want inconsistency.
Read more >
10 Common Machine Learning Mistakes and How to Avoid ...
To help your machine learning projects succeed, here is how to identify and avoid ten common machine learning pitfalls that can impact your ......
Read more >
Inference vs Prediction - Data Science Blog
Many people use prediction and inference synonymously although there is a subtle difference. Learn what it is here!
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found