question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory corruption with JAX>=0.2.27

See original GitHub issue

In the testing suite for Veros, we test for consistency with a Fortran reference code via f2py. For this, we essentially do this:

  1. Do all variable setup in Python / JAX
  2. Clone state to Fortran by doing getattr(fortran_module, var_name)[...] = getattr(python_module, var_name)
  3. Run Python function that is being tested
  4. Run corresponding Fortran function
  5. Compare

Starting with JAX 0.2.27 I noticed that one test fails sometimes (~ every 3rd run) because the Fortran result has some NaNs sprinkled into it. I do not see this with older JAX versions or when using NumPy instead of JAX, so I have to believe that the JAX / XLA runtime is somehow meddling with the memory used by the Fortran routine. I have verified that what is being copied to Fortran (in step 2) is consistent between backends and does not trigger NaN outputs by itself. I have only tested this on CPU.

I’m at a loss how to debug this further. Maybe someone has a hunch what changed in 0.2.27 that could cause this?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:14 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
hawkinspcommented, Feb 24, 2022

@soraros As I understand it, there’s an old out-of-tree flang that is more complete but depends on a really old LLVM fork, and its in-tree successor (also flang) that can’t yet generate code. The classic situation: choose either “deprecated” or “not ready yet”. But it looks like progress is happening!

1reaction
hawkinspcommented, Feb 22, 2022

Thanks, that’s great, I can run that. I bisected it to the commit that enabled MLIR (https://github.com/google/jax/commit/5801079a4b28874f607515c1131e75caacef5e39).

You can work around the problem even with a newer jax by setting JAX_ENABLE_MLIR=0, so hopefully that unblocks you.

I’m hoping to remove the non-MLIR path sometime soon, but not until all the bugs have been shaken out.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Memory Corruption Errors (e.g. ORA-600 [17182] / [17147 ...
may be seen when using Query Change Notification in a Java client application after adding a column to a table. The trace files...
Read more >
Corrupting memory without memory corruption
Corrupting memory without memory corruption. In this post I'll exploit CVE-2022-20186, a vulnerability in the Arm Mali GPU kernel driver and ...
Read more >
FIX MEMORY CORRUPTION PROBLEM CAUSED BY APAR ...
APAR PI58857/PTF UI37372 introduces a memory corruption problem. IGZXNE2N writes past the end of the result temp by 1 byte, which ends up...
Read more >
Memory corruption causes an access violation in an instance ...
A fix is available for an issue in which memory corruption occurs and causes an access violation in an instance of Microsoft SQL...
Read more >
How to detect the source of memory corruption in C program?
What other possibility could be there? You are assuming that only type2 is involved in the corruption, but often that is not the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found