question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"DNN library is not found." error when tensorflow is loaded before JAX

See original GitHub issue

Please:

  • Check for duplicate issues.
  • Provide a complete example of how to reproduce the bug, wrapped in triple backticks like this:
import jax.numpy as jnp
import tensorflow_datasets as tfds
from flax import linen as nn
from jax import random

# See https://github.com/tensorflow/tensorflow/issues/53831.
train_ds = tfds.load("cifar10", split="train", as_supervised=True)

model = nn.Conv(features=1, kernel_size=(3, 3), strides=(1, 1))
params = model.init(random.PRNGKey(123), jnp.zeros((1, 32, 32, 3)))

gives me an error:

RuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm for:
%cudnn-conv = (f32[1,32,32,1]{2,1,3,0}, u8[0]{0}) custom-call(f32[1,32,32,3]{2,1,3,0} %copy.3, f32[3,3,3,1]{1,0,2,3} %copy.4), window={size=3x3 pad=1_1x1_1}, dim_labels=b01f_01io->b01f, custom_call_target="__cudnn$convForward", metadata={op_type="conv_general_dilated" op_name="jit(conv_general_dilated)/conv_general_dilated[\n  batch_group_count=1\n  dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2))\n  feature_group_count=1\n  lhs_dilation=(1, 1)\n  lhs_shape=(1, 32, 32, 3)\n  padding=((1, 1), (1, 1))\n  precision=None\n  preferred_element_type=None\n  rhs_dilation=(1, 1)\n  rhs_shape=(3, 3, 3, 1)\n  window_strides=(1, 1)\n]" source_file="/nix/store/ys9bmmwpdqf3vlgxjvfy770qdk4dcf1n-python3.9-flax-0.3.6/lib/python3.9/site-packages/flax/linen/linear.py" source_line=282}, backend_config="{\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"

Original error: UNIMPLEMENTED: DNN library is not found.

But if I force TF to run on CPU with

import tensorflow as tf

tf.config.set_visible_devices([], 'GPU')

import jax.numpy as jnp
import tensorflow_datasets as tfds
from flax import linen as nn
from jax import random

# See https://github.com/tensorflow/tensorflow/issues/53831.
train_ds = tfds.load("cifar10", split="train", as_supervised=True)

model = nn.Conv(features=1, kernel_size=(3, 3), strides=(1, 1))
params = model.init(random.PRNGKey(123), jnp.zeros((1, 32, 32, 3)))

Then it works!

Why does TF having access to the GPU affect JAX’s ability to locate cuDNN?

Here’s my shell.nix for complete reproducibility: https://gist.github.com/samuela/319059b88a46a994b4c10dfa718f379e And here’s a relevant comment on another issue: https://github.com/NixOS/nixpkgs/pull/158186#issuecomment-1030486912

  • If applicable, include full error messages/tracebacks.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
mattjjcommented, Mar 16, 2022

I’m going to close this issue because there are already a few open that are about making this error message better.

0reactions
samuelacommented, Feb 5, 2022

Ah, I see. I still find the error message confusing since cuDNN is found, just does not succeed in initializing. But I think I can get things working from here.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Colab: (0) UNIMPLEMENTED: DNN library is not found
This error is because very recently New Tensorflow version is released 2.8.0. Colab has still default version 2.7.0. When you are trying to ......
Read more >
Can't train network: "DNN library is not found" - Image.sc Forum
I can label frames and create training datasets. However, starting training throws this error (2 different attempts):. UnimplementedError: 2 ...
Read more >
Building from source - JAX documentation
Building JAX involves two steps: Building or installing jaxlib , the C++ support library for jax . Installing the jax Python package.
Read more >
Transfer learning with TensorFlow Hub
TensorFlow Hub is a repository of pre-trained TensorFlow models. This tutorial demonstrates how to: Use models from TensorFlow Hub with tf.keras .
Read more >
Model Zoo - Deep learning code and pretrained models for ...
ModelZoo curates and provides a platform for deep learning researchers to easily find code and pre-trained models for a variety of platforms and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found