Failures in tests
See original GitHub issueAs a test of latest changes I’ve run the test suite in a loop with different seeds.
and for example for these tests
for a in `seq 0 35` ; do env DYNESTY_TEST_RANDOMSEED=$a PYTHONPATH=py:tests:$PYTHONPATH pytest tests/test_dyn.py > /tmp/log.${a} & done
there are ~ 15% of failures like this caused by the deviations being larger than 5 sigma
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /home/skoposov/curwork/dynesty
plugins: cov-2.12.1, parallel-0.1.0
collected 1 item
tests/test_dyn.py F [100%]
=================================== FAILURES ===================================
___________________________________ test_dyn ___________________________________
def test_dyn():
# hard test of dynamic sampler with high dlogz_init and small number
# of live points
ndim = 2
bound = 'multi'
rstate = get_rstate()
sampler = dynesty.DynamicNestedSampler(loglike_egg,
prior_transform_egg,
ndim,
nlive=nlive,
bound=bound,
sample='unif',
rstate=rstate)
sampler.run_nested(dlogz_init=1, print_progress=printing)
logz_truth = 235.856
> assert (abs(logz_truth - sampler.results.logz[-1]) <
5. * sampler.results.logzerr[-1])
E assert 0.29720712094015767 < (5.0 * 0.04555387445144435)
E + where 0.29720712094015767 = abs((235.856 - 236.15320712094015))
tests/test_dyn.py:39: AssertionError
=========================== short test summary info ============================
FAILED tests/test_dyn.py::test_dyn - assert 0.29720712094015767 < (5.0 * 0.04...
========================= 1 failed in 75.96s (0:01:15) =========================
Presumably there are more. And also It’s probably worth rerunning this https://github.com/joshspeagle/dynesty/issues/289 test. I haven’t yet decided what’s the strategy here. I think my preferred strategy is to update the defaults so that we still fall within 5 sigma, rather than bumping up the thresholds. (in fact there are a few places in the tests already where the thresholds are pretty large).
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (3 by maintainers)
Top Results From Across the Web
Test Failure Analysis Best Practices - Sauce Labs
Test failure analysis is what it sounds like: it's the process of analyzing a failed test to figure out what went wrong. ......
Read more >What is a Failure in software testing? - Try QA
If under certain environment and situation defects in the application or product get executed then the system will produce the wrong results causing...
Read more >What To Do When Tests Fail? - TestProject
Tests usually fail due to server and network issues, an unresponsive application or validation failure, or scripting issues. When failures occur ...
Read more >Activity: Analyze Test Failure
Purpose. To investigate the Test Log details and analyze the failures that occurred during test implementation and execution; To correct failures that ...
Read more >Foresight Blog | Why Are Your Tests Failing?
For instance, integration tests could fail due to a lost connection with the database. Or perhaps you're using the same environment for both...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Further testing revealed that if I bump up bootstrap to 100, that does resove the issue with the shell I show. (and that shows that bootstrap factors distirbution may be extremely asymmetric. Which lead to another solution that I tried. It helps the rate of failures by ~ a factor few. It’s cvalidation instead of bootstrap. I.e. split the points in bootstrap groups and fit using all but one group and then test on the left out. I think that’s better because at least all the points are guaranteed to be in the test dataset. In my case the default bootstrap=5 for shell in 2d leads to 8% cases missing more than 10% of volume, while the cv approach seem to reduce it to 1.3% of cases. So I’m intending to implement that
I guess the goal why I added that was to have at least some code coverage for all of these combinations But we may consider moving these tests for some simpler problem…
I kind’a thought 20 was overkill, but maybe we need to set that up.
I agree these need to be updated. I initially coded that in a bit of a rush, plus I wanted to speed up tests, as initially it was doing multiple runs. Also resample_run is partially used because of the code coverage.
It fails like this:
and I think this is harmelss.
As this should fail in ~ 0.6% of cases which is not that far of 2.5% of failures. So I was planning to bump up the p-values.
On test_exc, it’s just this code
didnt trigger the exception. so we need to bump up the 0.01 parameter