Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Load Failure around Checkpoint Reading

See original GitHub issue

Hi! Bumping into some problems trying to run this. If helpful, setup is GTX0180 using Cuda 7.5 with Tensorflow v.8 on Ubuntu 14.04

with a folder of my own images Im getting:

 tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
{'batch_size': 64,
 'beta1': 0.5,
 'checkpoint_dir': 'checkpoint',
 'dataset': 'doom2graphics',
 'epoch': 25,
 'image_size': 108,
 'is_crop': False,
 'is_train': True,
 'learning_rate': 0.0002,
 'sample_dir': 'samples',
 'train_size': inf,
 'visualize': False}
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 6.88GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
 [*] Reading checkpoints...
 [!] Load failed...

and then I get a little more info at the end when I try with the celeb set.

I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
{'batch_size': 64,
 'beta1': 0.5,
 'checkpoint_dir': 'checkpoint',
 'dataset': 'celebA',
 'epoch': 25,
 'image_size': 108,
 'is_crop': True,
 'is_train': True,
 'learning_rate': 0.0002,
 'sample_dir': 'samples',
 'train_size': inf,
 'visualize': False}
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 6.87GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
 [*] Reading checkpoints...
 [!] Load failed...
F tensorflow/stream_executor/cuda/cuda_dnn.cc:427] could not set cudnn filter descriptor: CUDNN_STATUS_BAD_PARAM
F tensorflow/stream_executor/cuda/cuda_dnn.cc:427] could not set cudnn filter descriptor: CUDNN_STATUS_BAD_PARAM
Aborted (core dumped)

noob to this universe - any thoughts? thank you for the time!!

Issue Analytics

State:
Created 7 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

6reactions

WarBeancommented, Aug 22, 2016

I fix the same problem by checking this line in model.py:

data = glob(os.path.join("./data", config.dataset, "*.jpg"))

It turned out that my directory name was not compatible with config.dataset. Try to print out the result of this line to make sure that you don’t get the data as an empty list. Then celebA can run.

0reactions

quintendewildecommented, Dec 1, 2017

I have this line instead, still I have load failed…

self.data = glob(os.path.join("./data", self.dataset_name, self.input_fname_pattern))