Memory corruption with JAX>=0.2.27
See original GitHub issueIn the testing suite for Veros, we test for consistency with a Fortran reference code via f2py
. For this, we essentially do this:
- Do all variable setup in Python / JAX
- Clone state to Fortran by doing
getattr(fortran_module, var_name)[...] = getattr(python_module, var_name)
- Run Python function that is being tested
- Run corresponding Fortran function
- Compare
Starting with JAX 0.2.27 I noticed that one test fails sometimes (~ every 3rd run) because the Fortran result has some NaNs sprinkled into it. I do not see this with older JAX versions or when using NumPy instead of JAX, so I have to believe that the JAX / XLA runtime is somehow meddling with the memory used by the Fortran routine. I have verified that what is being copied to Fortran (in step 2) is consistent between backends and does not trigger NaN outputs by itself. I have only tested this on CPU.
I’m at a loss how to debug this further. Maybe someone has a hunch what changed in 0.2.27 that could cause this?
Issue Analytics
- State:
- Created 2 years ago
- Comments:14 (8 by maintainers)
Top Results From Across the Web
Memory Corruption Errors (e.g. ORA-600 [17182] / [17147 ...
may be seen when using Query Change Notification in a Java client application after adding a column to a table. The trace files...
Read more >Corrupting memory without memory corruption
Corrupting memory without memory corruption. In this post I'll exploit CVE-2022-20186, a vulnerability in the Arm Mali GPU kernel driver and ...
Read more >FIX MEMORY CORRUPTION PROBLEM CAUSED BY APAR ...
APAR PI58857/PTF UI37372 introduces a memory corruption problem. IGZXNE2N writes past the end of the result temp by 1 byte, which ends up...
Read more >Memory corruption causes an access violation in an instance ...
A fix is available for an issue in which memory corruption occurs and causes an access violation in an instance of Microsoft SQL...
Read more >How to detect the source of memory corruption in C program?
What other possibility could be there? You are assuming that only type2 is involved in the corruption, but often that is not the...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@soraros As I understand it, there’s an old out-of-tree flang that is more complete but depends on a really old LLVM fork, and its in-tree successor (also
flang
) that can’t yet generate code. The classic situation: choose either “deprecated” or “not ready yet”. But it looks like progress is happening!Thanks, that’s great, I can run that. I bisected it to the commit that enabled MLIR (https://github.com/google/jax/commit/5801079a4b28874f607515c1131e75caacef5e39).
You can work around the problem even with a newer
jax
by settingJAX_ENABLE_MLIR=0
, so hopefully that unblocks you.I’m hoping to remove the non-MLIR path sometime soon, but not until all the bugs have been shaken out.