question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[TODO] Investigate equivalence tests

See original GitHub issue

(add a lot of assignees just to make you informed and kept updated in the future. Don’t hesitate to remove yourself if you think it’s irrelevant)

Currently the PT/TF/Flax equivalence tests use 1e-5 as the tolerance for the absolute differences of outputs.

We see that these tests failed with a non-negligible (although not carefully defined) frequency.

Create this page to track a list of models to investigate.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
ydshiehcommented, Apr 13, 2022

Another one to add to this list: tests/funnel/test_modeling_funnel.py::FunnelModelTest::test_pt_tf_model_equivalence. I’ve been getting a failure in this one every other day – example: https://app.circleci.com/pipelines/github/huggingface/transformers/38007/workflows/2a98b7b1-5ad0-4b80-a702-1887c620193f/jobs/421265

(just for the record) Among 500 runs:

  • 34 runs have FunnelForMaskedLM.output.logits at around 1e-5 ~ 2e-5: so ~ 6.8% chance of failure 😢
  • 66 runs at around 9e-6
  • 38 runs at around 8e-6

(so > 25% to get close to 1e-5)

1reaction
gantecommented, Apr 12, 2022

Another one to add to this list: tests/funnel/test_modeling_funnel.py::FunnelModelTest::test_pt_tf_model_equivalence. I’ve been getting a failure in this one every other day – example: https://app.circleci.com/pipelines/github/huggingface/transformers/38007/workflows/2a98b7b1-5ad0-4b80-a702-1887c620193f/jobs/421265

Read more comments on GitHub >

github_iconTop Results From Across the Web

Equivalence Tests - PMC - NCBI - NIH
Equivalence testing invites researchers to make more specific predictions about the effect size they find worthwhile to examine. Bayesian ...
Read more >
Equivalence Testing for Psychological Research: A Tutorial
Equivalence tests can be seen as the opposite of minimal effects tests: They examine. 51 whether the presence of effects that are large...
Read more >
Equivalence Testing
Examine the Data and Calculate the p-value . ... Equivalence testing is an adjustment to this process to determine if the source populations ......
Read more >
COVID-19 Resources
OptumServe COVID Testing Locations NOW Offering COVID Vaccinations ... full-time equivalent employees or individuals who are self-employed.
Read more >
Consolidate Duplicate URLs with Canonical Tags
Google prefers HTTPS pages over equivalent HTTP pages as canonical, except when there are issues or conflicting signals such as the following:.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found