jax.scipy.linalg routines segfault on Mac OS X on scipy 1.2.1 or later but not scipy 1.1.0
See original GitHub issueI’m not exactly sure why this happens, being unfamiliar with the internal architecture, but on MacOS with Python 3.6.8, the following code segfaults if scipy 1.2.1 is installed (the version that comes by default when you pip install jax jaxlib
):
import jax.random as random
import jax.scipy.linalg as linalg
key = random.PRNGKey(42)
# For some reason, matrices smaller than (50, 50) or so do not trigger segfaults
X = random.normal(key, (500, 500))
A = X @ X.T # Drawn from standard Wishart distribution
linalg.cholesky(A)
print("Success!")
Output:
$ python -W ignore test.py
zsh: bus error python -W ignore test.py
If I roll back to Scipy 1.1.0, everything works:
$ python -W ignore test.py
Success!
This is a great project by the way–thanks for working on it!
Edit: after further digging, I found the following in the the Scipy 1.2 release notes:
scipy.linalg.lapack now exposes the LAPACK routines using the Rectangular Full Packed storage (RFP) for upper triangular, lower triangular, symmetric, or Hermitian matrices; the upper trapezoidal fat matrix RZ decomposition routines are now available as well.
Perhaps this has something to do with it?
Even more edits: yet more digging has revealed scipy/scipy#9751, which hints that this might be caused by a specific (old) version of XCode. I will report back once XCode is upgraded.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:10 (6 by maintainers)
#2927 should fix this bug; it requires a new
jaxlib
, so you can either build from source or we will most likely make a release next week.I think I’ve figured out what’s going wrong here, and why it’s Mac OS specific.
The problem is that we run out of stack space and crash due to a stack overflow. Mac OS thread stacks default to 512KiB, whereas Linux defaults to 8MiB stacks. (Since these threads are part of a thread pool, you cannot work around this by changing
ulimit
, it requires code changes.)I’m not quite sure what the best way to fix this is at the moment but I’ll figure something out.