RuntimeError: INTERNAL: Core halted unexpectedly: No error message available as no compiler metadata was provided.
See original GitHub issueThe script runs normally on a Cloud TPU v2-8 VM before, but now it shows an error:
import os
os.environ['XLA_PYTHON_CLIENT_ALLOCATOR'] = 'platform'
import jax
import subprocess
np = jax.numpy
devices = jax.devices()
def show_mem(result: np.ndarray) -> str:
result.block_until_ready()
jax.profiler.save_device_memory_profile('/tmp/memory.prof')
return subprocess.run(['go', 'tool', 'pprof', '-tags', '/tmp/memory.prof'], stdout=subprocess.PIPE, stderr=subprocess.DEVNULL).stdout.decode('utf-8')
def largest_v2() -> np.ndarray:
return np.zeros((1024, 1024, 957, 2), dtype=np.float32)
# print(show_mem(largest_v2()))
print(show_mem(jax.jit(largest_v2, device=devices[1])()))
print(show_mem(jax.jit(largest_v2, device=devices[2])()))
Error message:
$ python test_memory.py
device: Total 7.5GB
7.5GB ( 100%): TPU_1(process=0,(0,0,0,1))
kind: Total 7.5GB
7.5GB ( 100%): buffer
-1.0B (1.2e-08%): executable
2022-02-19 21:29:36.266338: W external/org_tensorflow/tensorflow/stream_executor/stream.cc:275] Error blocking host until done in stream destructor: INTERNAL: stream did not block host until done; was already in an error state
Traceback (most recent call last):
File "test_memory.py", line 22, in <module>
print(show_mem(jax.jit(largest_v2, device=devices[2])()))
File "test_memory.py", line 12, in show_mem
result.block_until_ready()
RuntimeError: INTERNAL: Core halted unexpectedly: No error message available as no compiler metadata was provided.
2022-02-19 21:29:36.404625: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/local_device_state.cc:74] Error when closing device: INTERNAL: Core halted unexpectedly: No error message available as no compiler metadata was provided.
2022-02-19 21:29:36.404907: W external/org_tensorflow/tensorflow/stream_executor/stream.cc:275] Error blocking host until done in stream destructor: INTERNAL: stream did not block host until done; was already in an error state
2022-02-19 21:29:36.405494: W external/org_tensorflow/tensorflow/stream_executor/stream.cc:275] Error blocking host until done in stream destructor: INTERNAL: stream did not block host until done; was already in an error state
Library versions:
$ pip list | grep jax
jax 0.3.1
jaxlib 0.3.0
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (4 by maintainers)
Top Results From Across the Web
Database Engine events and errors - SQL Server
Consult this MSSQL error code list to find explanations for error messages for SQL Server database engine events.
Read more >Warnings and Errors - Oracle Help Center
Error number Error or warning message Details
403 Attempt to read from checkpoint truncated
412 Bad file‑open mode Internal error. Contact TimesTen C...
413 Bad file‑exists...
Read more >Troubleshoot Dataflow errors - Google Cloud
This error occurs if the pipeline could not be started due to Google Compute Engine metadata limits being exceeded. These limits cannot be...
Read more >Bug listing with status UNCONFIRMED as at 2022/12/20 15 ...
Bug:128538 - "sys-apps/coreutils: /bin/hostname should be installed from coreutils not sys-apps/net-tools" status:UNCONFIRMED resolution: severity:enhancement ...
Read more >$atan - Rocket Software Documentation
Browser Displays Page with HTTP 500 - Internal server error ... Model refers to input field FieldName, which is not found in the...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks for the speedy replies!
For the
libtpu-nightly==0.1.dev20220218
SIGABRT failure, please feel free to report that kind of thing here! In this case, we’re already aware of the issue and should have a fixed libtpu-nightly out soon (apologies for suggesting you try it, I forgot about this issue).Thanks also for isolating where the
Core halted unexpectedly
error began. This will help with debugging.I’m using different code but encountered the same error message. Here’s my Jax and libtpu version:
I’ve attached my tpu_driver.INFO in this gist.