question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Load Failure around Checkpoint Reading

See original GitHub issue

Hi! Bumping into some problems trying to run this. If helpful, setup is GTX0180 using Cuda 7.5 with Tensorflow v.8 on Ubuntu 14.04

with a folder of my own images Im getting:

 tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
{'batch_size': 64,
 'beta1': 0.5,
 'checkpoint_dir': 'checkpoint',
 'dataset': 'doom2graphics',
 'epoch': 25,
 'image_size': 108,
 'is_crop': False,
 'is_train': True,
 'learning_rate': 0.0002,
 'sample_dir': 'samples',
 'train_size': inf,
 'visualize': False}
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 6.88GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
 [*] Reading checkpoints...
 [!] Load failed...

and then I get a little more info at the end when I try with the celeb set.

I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
{'batch_size': 64,
 'beta1': 0.5,
 'checkpoint_dir': 'checkpoint',
 'dataset': 'celebA',
 'epoch': 25,
 'image_size': 108,
 'is_crop': True,
 'is_train': True,
 'learning_rate': 0.0002,
 'sample_dir': 'samples',
 'train_size': inf,
 'visualize': False}
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 6.87GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
 [*] Reading checkpoints...
 [!] Load failed...
F tensorflow/stream_executor/cuda/cuda_dnn.cc:427] could not set cudnn filter descriptor: CUDNN_STATUS_BAD_PARAM
F tensorflow/stream_executor/cuda/cuda_dnn.cc:427] could not set cudnn filter descriptor: CUDNN_STATUS_BAD_PARAM
Aborted (core dumped)

noob to this universe - any thoughts? thank you for the time!!

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

6reactions
WarBeancommented, Aug 22, 2016

I fix the same problem by checking this line in model.py:

data = glob(os.path.join("./data", config.dataset, "*.jpg"))

It turned out that my directory name was not compatible with config.dataset. Try to print out the result of this line to make sure that you don’t get the data as an empty list. Then celebA can run.

0reactions
quintendewildecommented, Dec 1, 2017

I have this line instead, still I have load failed…

self.data = glob(os.path.join("./data", self.dataset_name, self.input_fname_pattern))

Read more comments on GitHub >

github_iconTop Results From Across the Web

[*] Reading checkpoints... [!] Load failed... · Issue #51 - GitHub
I am working on super resolution algorithm based on DCGAN. I am getting same error like Reading checkpoints , load failed .. I...
Read more >
Cannot load checkpoints - tensorflow - Stack Overflow
You should be able to load the checkpoints according to the TensorFlow documentation like this:
Read more >
Installation failed. Reason: Load on Module failed...
Hi All, we have a environment, where management is on R80 and gateway is on R75.40 SPLAT. We sometime faces following error when...
Read more >
How to Fix the Error: Hyper-V Checkpoint Operation Failed
Change the checkpoint type · Open VM settings. · Click Checkpoints in the Management section. · Change the type of checkpoint by selecting...
Read more >
OSError: Unable to load weights from pytorch checkpoint file
If you tried to load a PyTorch model from a TF 2.0 checkpoint, ... PytorchStreamReader failed reading zip archive: failed finding central ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found