"DNN library is not found." error when tensorflow is loaded before JAX
See original GitHub issuePlease:
- Check for duplicate issues.
- Provide a complete example of how to reproduce the bug, wrapped in triple backticks like this:
import jax.numpy as jnp
import tensorflow_datasets as tfds
from flax import linen as nn
from jax import random
# See https://github.com/tensorflow/tensorflow/issues/53831.
train_ds = tfds.load("cifar10", split="train", as_supervised=True)
model = nn.Conv(features=1, kernel_size=(3, 3), strides=(1, 1))
params = model.init(random.PRNGKey(123), jnp.zeros((1, 32, 32, 3)))
gives me an error:
RuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm for:
%cudnn-conv = (f32[1,32,32,1]{2,1,3,0}, u8[0]{0}) custom-call(f32[1,32,32,3]{2,1,3,0} %copy.3, f32[3,3,3,1]{1,0,2,3} %copy.4), window={size=3x3 pad=1_1x1_1}, dim_labels=b01f_01io->b01f, custom_call_target="__cudnn$convForward", metadata={op_type="conv_general_dilated" op_name="jit(conv_general_dilated)/conv_general_dilated[\n batch_group_count=1\n dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2))\n feature_group_count=1\n lhs_dilation=(1, 1)\n lhs_shape=(1, 32, 32, 3)\n padding=((1, 1), (1, 1))\n precision=None\n preferred_element_type=None\n rhs_dilation=(1, 1)\n rhs_shape=(3, 3, 3, 1)\n window_strides=(1, 1)\n]" source_file="/nix/store/ys9bmmwpdqf3vlgxjvfy770qdk4dcf1n-python3.9-flax-0.3.6/lib/python3.9/site-packages/flax/linen/linear.py" source_line=282}, backend_config="{\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"
Original error: UNIMPLEMENTED: DNN library is not found.
But if I force TF to run on CPU with
import tensorflow as tf
tf.config.set_visible_devices([], 'GPU')
import jax.numpy as jnp
import tensorflow_datasets as tfds
from flax import linen as nn
from jax import random
# See https://github.com/tensorflow/tensorflow/issues/53831.
train_ds = tfds.load("cifar10", split="train", as_supervised=True)
model = nn.Conv(features=1, kernel_size=(3, 3), strides=(1, 1))
params = model.init(random.PRNGKey(123), jnp.zeros((1, 32, 32, 3)))
Then it works!
Why does TF having access to the GPU affect JAX’s ability to locate cuDNN?
Here’s my shell.nix for complete reproducibility: https://gist.github.com/samuela/319059b88a46a994b4c10dfa718f379e And here’s a relevant comment on another issue: https://github.com/NixOS/nixpkgs/pull/158186#issuecomment-1030486912
- If applicable, include full error messages/tracebacks.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Colab: (0) UNIMPLEMENTED: DNN library is not found
This error is because very recently New Tensorflow version is released 2.8.0. Colab has still default version 2.7.0. When you are trying to ......
Read more >Can't train network: "DNN library is not found" - Image.sc Forum
I can label frames and create training datasets. However, starting training throws this error (2 different attempts):. UnimplementedError: 2 ...
Read more >Building from source - JAX documentation
Building JAX involves two steps: Building or installing jaxlib , the C++ support library for jax . Installing the jax Python package.
Read more >Transfer learning with TensorFlow Hub
TensorFlow Hub is a repository of pre-trained TensorFlow models. This tutorial demonstrates how to: Use models from TensorFlow Hub with tf.keras .
Read more >Model Zoo - Deep learning code and pretrained models for ...
ModelZoo curates and provides a platform for deep learning researchers to easily find code and pre-trained models for a variety of platforms and...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I’m going to close this issue because there are already a few open that are about making this error message better.
Ah, I see. I still find the error message confusing since cuDNN is found, just does not succeed in initializing. But I think I can get things working from here.