question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

eigh() tests fail to pass, crash Python with seemingly ramdom pattern

See original GitHub issue

This problem is related to #11601, which has been closed by #11702 ( @ilayn ). However, the crash has not been fixed by the latter PR.

The symptoms remained almost identical to the one described in my comment in https://github.com/scipy/scipy/issues/11601#issuecomment-600153321

In summary, when running the test for eigh(), Python tends to crash with SIGSEGV or SIGABRT. Sometimes this happens during the test_eigh() function, sometimes after it passed with “100%” but before pytest returns.

The test that triggers the crash is the following test function:

https://github.com/scipy/scipy/blob/ae34ce4835949a8310d7c3d7bcb4a55aafd11f4f/scipy/linalg/tests/test_decomp.py#L863-L888

Some patterns from the histories of crashes

I run the test script with runtests.py 100 times and saved the output as text files.

By grepping the output files ./runtests.py, I notice that the last-known position in Python before it crashes could be three lines, namely 873, 876, and 877. L 873 is the actual call to eigh(), while the crash can happen as late as 876 or 877, where the arrays returned from eigh() are accessed.

Only 6 out of 100 runs passed without any problems.

In some cases (35 out of the 100), Python segfaults after nominally completing all the tests in TestEigh::test_eigh.

In the cases where Python was killed with SIGABRT, 36 were at L 873 (call to eigh()), while 9 were at L 876 where output z was used. In many other runs, the test script was not featured in the Python backtrace if any.

The parametrized inputs that triggered the crash were of the form test_eigh[6-D-XXX-YYY-ZZZ-eigvals1]. That is, the crashes happened for dimension 6, dtype double complex, with eigvals= keyword parameter set to the tuple (2, 4). The XXXZZZ parameters are boolean flags for keywords turbo, lower, and overwrite respectively.

An incomplete tally of the parameters (turbo, lower, and overwrite), where Python crashed before finishing all the tests, is as follows:

   5 False-False-False
  11 False-False-True
  13 False-True-False
   6 False-True-True
   7 True-False-False
   4 True-False-True
  15 True-True-True

The combination (turbo=True, lower=True, overwrite=False) is the one missing from the 2^3 = 8 cases yet.

Reproducing code example:

./runtests.py -vt scipy/linalg/tests/test_decomp.py::TestEigh::test_eigh

Scipy/Numpy/Python version information:

Scipy master branch as of ae34ce48, Numpy 1.18.1, Python 3.7.6, conda macos with MKL 2019.4.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:52 (52 by maintainers)

github_iconTop GitHub Comments

2reactions
ilayncommented, Apr 29, 2020

Intel team confirmed the bug and included the fix for the upcoming MKL 2020 update 2.

2reactions
oleksandr-pavlykcommented, Apr 6, 2020

@ilayn Done.

Read more comments on GitHub >

github_iconTop Results From Across the Web

debugging - What's the toughest bug you ever found and fixed ...
The toughest bug I ever had to fix was one I'd raised myself - I contracted as a tester for a large telco,...
Read more >
Changelog — Hypothesis 6.60.0 documentation
This patch fixes issue #2657, where passing unicode patterns compiled with re.IGNORECASE to from_regex() could trigger an internal error when casefolding a ...
Read more >
Preempting Flaky Tests via Non-Idempotent-Outcome Tests
First, testing frameworks, such as JUnit, do not mandate the order in which tests are run, and test suites that pass in one...
Read more >
Build a Hash Table in Python With TDD
Take a Crash Course in Test-Driven Development ... The language also has a global hash() function, used primarily for quick element lookup ...
Read more >
Common Error Messages - Sauce Labs Documentation
Below are some Sauce Labs automated testing common error messages and how to fix them. Mobile and Web App Testing​. Abuse Job​. Description....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found