question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Verify model confidences manually before 3.0 release

See original GitHub issue

Slack Thread

We have run model regression tests in the integration phase of architecture revamp. However, those tests do not check the distribution of confidences that are output by the models. This distribution can be generated by training and testing on a dataset and examining the plot generated in intent_histogram.png. The examination should check if the confidence distribution of correct and wrong predictions looks “approximately” the same when trained with 2.8.x and 3.0.0 (They won’t be exactly the same because of some changes that come with 3.0).

The datasets and configs (can be found in training-data repo) on which this should be tested at at the least (covers english and german dataset + configs that are frequently used by customers):

Dataset: public/Sara Configs:

  • en/cvf_bert_diet_responset2t.yml
  • en/cvf_diet_responset2t.yml
  • en/cvf_embedding_responseb2b.yml
  • en/cvf_bert_embedding_responseb2b.yml

Dataset: private/service_faq Configs:

  • en/cvf_spacy_diet_responset2t.yml
  • en/cvf_diet_responset2t.yml
  • en/cvf_embedding_responseb2b.yml
  • en/cvf_spacy_embedding_responseb2b.yml

Definition of Done:

  • Training and evaluation run for the above dataset and config combo using 2.8.x and a release candidate of 3.0.0 / main branch of Rasa OSS
  • Verified that intent_histogram.png look “approximately similar” in every unique instance of the dataset and config combo.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:19 (18 by maintainers)

github_iconTop GitHub Comments

3reactions
dakshvar22commented, Mar 17, 2022

Exalate commented:

dakshvar22 commented:

Are you saying that the difference between 2.8.x and 3.0.x is more than the difference expected from e.g. re-running on 2.8.x?

Re-running on 2.8.x multiple times on CPU will yield exactly same model confidences (if run on CPU), and hence the confidence histograms will be exactly the same. So, we should really find out what is causing the difference in model confidences when the same config + dataset + CPU machine is used but with 2.8.x and 3.0 installations of rasa.

I should emphasize that the differences don’t appear to be large (for e.g. - in private/service_faq dataset, the distribution seems to be shifted for 10-11 training examples by a small amount) so it isn’t a high-priority investigation, but nevertheless it should be done at some point in time to prevent an unknown regression causing larger regressions in the future.

1reaction
dakshvar22commented, Mar 17, 2022

Exalate commented:

dakshvar22 commented:

@m-vdb @joejuzl There are a few open questions here (apologies for not replying to Kathrin’s question earlier) and I don’t think the issue should be closed. Do you want the conversation to happen anywhere else?

Read more comments on GitHub >

github_iconTop Results From Across the Web

FAQ - The PROCESS macro for SPSS, SAS, and R
Answer: Every preprogrammed model that PROCESS will estimate has a template in Appendix A in the second and third editions. None are missing....
Read more >
deepmind/alphafold: Open source code for AlphaFold. - GitHub
Read the guide for how to upgrade and update code. The technical note containing the models and inference procedure for an updated AlphaFold...
Read more >
Frestimate Users Manual Version 3.801 Section 3.0 Predictions
Shortcut model and the SoftRel Full-scale model manual. Figure 1-2 shows the inputs and results for predicting testing defect density.
Read more >
OWASP Application Security Verification Standard 3.0
The Application Security Verification Standard is a list of application security requirements or tests that can be used by architects, ...
Read more >
Version Migration Guide - Rasa
This page contains information about changes between major versions and how you can migrate from one version to another.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found