Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't find ptxas binary in ${CUDA_DIR}/bin.

See original GitHub issue

Description

Try to run the reformer model with the configuration reformer_enwik8.gin. Get an error: Can’t find ptxas binary in ${CUDA_DIR}/bin. …

Environment information

OS: Ubuntu 18.04.3 LTS

$ pip freeze | grep tensor
mesh-tensorflow==0.1.7
tensor2tensor==1.15.4
tensorboard==1.15.0
tensorflow-datasets==1.3.2
tensorflow-estimator==1.15.1
tensorflow-gan==2.0.0
tensorflow-gpu==1.15.0
tensorflow-hub==0.7.0
tensorflow-metadata==0.15.2
tensorflow-probability==0.7.0
tensorrt==6.0.1.4

$ pip freeze | grep jax
jax==0.1.57
jaxlib==0.1.37

$ python -V
python 3.6.8

$ nvcc --version 
cuda10.0 (/usr/local/cuda --> /usr/local/cuda-10.0, but /usr/local/cuda-10.1 exists)

GPU: 2080TI * 4

For bugs: reproduction and error logs

# Steps to reproduce:
Just run the trainer.py in trax/trax using the configuration reformer_enwiki8.gin.

# Error logs:
[[[!!!! I remove some normal info about dataset]]]
I0119 09:32:55.178084 140128464549696 problem.py:651] Reading data files from /root/tensorflow_datasets/t2t_enwik8_l65k/enwik8_l65k-dev*
INFO:tensorflow:partition: 0 num_data_files: 1
I0119 09:32:55.179685 140128464549696 problem.py:677] partition: 0 num_data_files: 1
I0119 09:32:56.124050 140128464549696 inputs.py:443] Heuristically setting bucketing to False based on shapes of target tensors.
I0119 09:32:56.131589 140128464549696 inputs.py:443] Heuristically setting bucketing to False based on shapes of target tensors.
I0119 09:32:56.136316 140128464549696 inputs.py:443] Heuristically setting bucketing to False based on shapes of target tensors.
I0119 09:33:05.191175 140128464549696 trainer_lib.py:754] Model loaded from ../checkpoints/model.pkl at step 0
Model loaded from ../checkpoints/model.pkl at step 0
I0119 09:33:05.192780 140128464549696 trainer_lib.py:754] Step      0: Starting training using 1 devices
Step      0: Starting training using 1 devices
I0119 09:33:05.194077 140128464549696 trainer_lib.py:754] Step      0: Total number of trainable weights: 215865602
Step      0: Total number of trainable weights: 215865602

2020-01-19 09:33:09.105234: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:09.105464: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:09.105489: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:09.105517: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:09.105532: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:09.105554: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:09.105567: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:09.193084: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:09.193291: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:09.193319: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:09.193338: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:09.193354: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:09.193384: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:09.193418: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:09.345517: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:09.345708: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:09.345732: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:09.345749: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:09.345762: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:09.345776: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:09.345790: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:09.440697: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:09.440881: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:09.440903: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:09.440918: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:09.440930: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:09.440941: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:09.440954: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:09.545554: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:09.545752: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:09.545774: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:09.545791: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:09.545804: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:09.545815: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:09.545827: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:09.730990: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:09.731233: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:09.731260: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:09.731279: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:09.731293: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:09.731305: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:09.731319: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:10.081432: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:10.081621: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:10.081644: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:10.081659: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:10.081671: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:10.081708: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:10.081721: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:13.557328: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:13.557530: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:13.557552: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:13.557567: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:13.557578: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:13.557589: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:13.557601: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:13.633426: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:13.633613: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:13.633636: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:13.633651: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:13.633663: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:13.633700: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:13.633713: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:13.709584: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:13.709778: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:13.709801: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:13.709815: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:13.709826: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:13.709839: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:13.709876: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:14.256316: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:14.256517: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find ptxas binary in ${CUDA_DIR}/bin.  Will back to the GPU driver for PTX -> sass compilation.  This is OK so long as you don't see a warning below about an out-of-date driver version.
2020-01-19 09:33:14.256540: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2020-01-19 09:33:14.256556: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   ./cuda_sdk_lib
2020-01-19 09:33:14.256568: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   /usr/local/cuda
2020-01-19 09:33:14.256579: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2020-01-19 09:33:14.256591: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2020-01-19 09:33:31.094227: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:31.094430: W external/org_tensorflow/tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Failed to launch ptxas
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-01-19 09:33:31.177827: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
2020-01-19 09:33:31.255405: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:208] Start cannot fork() child process: Cannot allocate memory
Traceback (most recent call last):
  File "/home/xxx/pycharm_proj/trax/trax/trainer.py", line 195, in <module>
    app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/xxx/pycharm_proj/trax/trax/trainer.py", line 189, in main
    trainer_lib.train(output_dir=output_dir)
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1078, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/usr/local/lib/python3.6/dist-packages/gin/utils.py", line 49, in augment_exception_message_and_reraise
    six.raise_from(proxy.with_traceback(exception.__traceback__), None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1055, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/xxx/pycharm_proj/trax/trax/supervised/trainer_lib.py", line 641, in train
    trainer.train_epoch(epoch_steps, eval_steps)
  File "/home/xxx/pycharm_proj/trax/trax/supervised/trainer_lib.py", line 305, in train_epoch
    self.train_step(batch)
  File "/home/xxx/pycharm_proj/trax/trax/supervised/trainer_lib.py", line 337, in train_step
    self._step, opt_state, batch, self._model_state, self._rngs)
  File "/usr/local/lib/python3.6/dist-packages/jax/api.py", line 149, in f_jitted
    out = xla.xla_call(flat_fun, *args_flat, device=device, backend=backend)
  File "/usr/local/lib/python3.6/dist-packages/jax/core.py", line 602, in call_bind
    outs = primitive.impl(f, *args, **params)
  File "/usr/local/lib/python3.6/dist-packages/jax/interpreters/xla.py", line 442, in _xla_call_impl
    compiled_fun = _xla_callable(fun, device, backend, *map(arg_spec, args))
  File "/usr/local/lib/python3.6/dist-packages/jax/linear_util.py", line 223, in memoized_fun
    ans = call(fun, *args)
  File "/usr/local/lib/python3.6/dist-packages/jax/interpreters/xla.py", line 499, in _xla_callable
    compiled = built.Compile(compile_options=options, backend=xb.get_backend(backend))
  File "/usr/local/lib/python3.6/dist-packages/jaxlib/xla_client.py", line 609, in Compile
    return backend.compile(self.computation, compile_options)
  File "/usr/local/lib/python3.6/dist-packages/jaxlib/xla_client.py", line 161, in compile
    compile_options.device_assignment)
RuntimeError: Internal: Failed to launch ptxas