Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failures in tests

See original GitHub issue

As a test of latest changes I’ve run the test suite in a loop with different seeds.

and for example for these tests

for a in `seq 0 35` ; do env  DYNESTY_TEST_RANDOMSEED=$a PYTHONPATH=py:tests:$PYTHONPATH pytest tests/test_dyn.py    > /tmp/log.${a} &  done

there are ~ 15% of failures like this caused by the deviations being larger than 5 sigma

============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /home/skoposov/curwork/dynesty
plugins: cov-2.12.1, parallel-0.1.0
collected 1 item                                                               

tests/test_dyn.py F                                                      [100%]

=================================== FAILURES ===================================
___________________________________ test_dyn ___________________________________

    def test_dyn():
        # hard test of dynamic sampler with high dlogz_init and small number
        # of live points
        ndim = 2
        bound = 'multi'
        rstate = get_rstate()
        sampler = dynesty.DynamicNestedSampler(loglike_egg,
                                               prior_transform_egg,
                                               ndim,
                                               nlive=nlive,
                                               bound=bound,
                                               sample='unif',
                                               rstate=rstate)
        sampler.run_nested(dlogz_init=1, print_progress=printing)
        logz_truth = 235.856
>       assert (abs(logz_truth - sampler.results.logz[-1]) <
                5. * sampler.results.logzerr[-1])
E       assert 0.29720712094015767 < (5.0 * 0.04555387445144435)
E        +  where 0.29720712094015767 = abs((235.856 - 236.15320712094015))

tests/test_dyn.py:39: AssertionError
=========================== short test summary info ============================
FAILED tests/test_dyn.py::test_dyn - assert 0.29720712094015767 < (5.0 * 0.04...
========================= 1 failed in 75.96s (0:01:15) =========================

Presumably there are more. And also It’s probably worth rerunning this https://github.com/joshspeagle/dynesty/issues/289 test. I haven’t yet decided what’s the strategy here. I think my preferred strategy is to update the defaults so that we still fall within 5 sigma, rather than bumping up the thresholds. (in fact there are a few places in the tests already where the thresholds are pretty large).

Issue Analytics

State:
Created 2 years ago
Comments:9 (3 by maintainers)

Top GitHub Comments

1reaction

segasaicommented, Aug 25, 2021

Further testing revealed that if I bump up bootstrap to 100, that does resove the issue with the shell I show. (and that shows that bootstrap factors distirbution may be extremely asymmetric. Which lead to another solution that I tried. It helps the rate of failures by ~ a factor few. It’s cvalidation instead of bootstrap. I.e. split the points in bootstrap groups and fit using all but one group and then test on the left out. I think that’s better because at least all the points are guaranteed to be in the test dataset. In my case the default bootstrap=5 for shell in 2d leads to 8% cases missing more than 10% of volume, while the cv approach seem to reduce it to 1.3% of cases. So I’m intending to implement that

1reaction

segasaicommented, Aug 23, 2021

The failures with none-[method] in test_gau.py are likely due to the fact that the Gaussian has large off-diagonal elements; since the proposals don’t take advantage of the auto-correlation structure, they likely are strongly correlated and likely to over-estimate the evidence. Increasing the number of walks or slices probably should resolve those. Alternately, I don’t think those combinations are all that useful since none should ideally never be used with those settings (except implicitly by setting ncdim; we could consider removing them.

I guess the goal why I added that was to have at least some code coverage for all of these combinations But we may consider moving these tests for some simpler problem…

I suspect the test_dyn and test_periodic tests would be resolved if the default number of bootstraps was increased. We’re currently using 5 IIRC, whereas in Buchner (2014) the recommended number of bootstraps is substantially higher (on order 50). I think dynesty used to use 20-ish by default?

I kind’a thought 20 was overkill, but maybe we need to set that up.

I agree the test_dynamic in test_gau looks like it should use logz_err rather than just a tolerance of 1. It also calls simulate_run, which should be removed and deprecated. Also, the tests for jitter_run and resample_run should probably be changed to instead use the two functions to estimate logz_err, instead of just passing the random realization to check_results_gau.

I agree these need to be updated. I initially coded that in a bit of a rush, plus I wanted to speed up tests, as initially it was doing multiple runs. Also resample_run is partially used because of the code coverage.

What is happening with the failed test_exc and test_ellipsoid tests? Any ideas? test_ellipsoid is harmless I think.

It fails like this:

>           assert ((pval > 0.003) & (pval < 0.997))
E           assert (0.00016418660411632424 > 0.003 & 0.00016418660411632424 < 0.997)

and I think this is harmelss.

As this should fail in ~ 0.6% of cases which is not that far of 2.5% of failures. So I was planning to bump up the p-values.

On test_exc, it’s just this code

def loglike_exc(x):
    r2 = np.sum(x**2)
    if r2 < 0.01:
        raise MyException('ooops')
    return -0.5 * r2

didnt trigger the exception. so we need to bump up the 0.01 parameter

Top Results From Across the Web

Test Failure Analysis Best Practices - Sauce Labs

Test failure analysis is what it sounds like: it's the process of analyzing a failed test to figure out what went wrong. ......

What is a Failure in software testing? - Try QA

If under certain environment and situation defects in the application or product get executed then the system will produce the wrong results causing...

What To Do When Tests Fail? - TestProject

Tests usually fail due to server and network issues, an unresponsive application or validation failure, or scripting issues. When failures occur ...

Activity: Analyze Test Failure

Purpose. To investigate the Test Log details and analyze the failures that occurred during test implementation and execution; To correct failures that ...

Foresight Blog | Why Are Your Tests Failing?

For instance, integration tests could fail due to a lost connection with the database. Or perhaps you're using the same environment for both...