question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't run on v3-8 or v3-32 TPU nodes.

See original GitHub issue

Hi, I trained TPU-accelerated GANs from https://github.com/tensorflow/gan without any issues, but can’t seem to get compare_gan examples to run on GCP TPUs.

Here is the general error, which appears whether using ctpu, gcloud, or the online GUI to setup compute resources.

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation input_pipeline_task0/TensorSliceDataset: node input_pipeline_task0/TensorSliceDataset (defined at /usr/local/lib/python3.5/dist-packages/tensorflow_core/python/framework/ops.py:1748) was explicitly assigned to /job:worker/task:0/device:CPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.

Any thoughts here? Is there a specific python/tensorflow version I should use for running compare_gan?

Thanks!

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:7

github_iconTop GitHub Comments

2reactions
mbbrodiecommented, Sep 16, 2020

You’re welcome. And yes, you’ll need to pip install googleapiclient and oauth2client as well. For simplicity, you could also change cp /usr/local/cuda-10.1/lib64/libcudart.so libcudart.so.10.0 (and other cp commands) to sudo cp /usr/local/cuda-10.1/lib64/libcudart.so /usr/local/cuda-10.1/lib64/libcudart.so.10.0.

Otherwise, make sure you put the renamed libs on your LD_LIBRARY_PATH.

1reaction
hytseng0509commented, Sep 16, 2020

@mbbrodie Thanks for the information! Do you encounter the error message that request the installation of googleapiclient and oauth2client in your setup?

Read more comments on GitHub >

github_iconTop Results From Across the Web

TPU regions and zones | Google Cloud
When you create a TPU node, you specify the zone in which you want to create it. See the Compute Engine Global, regional,...
Read more >
What are requirements for allocating a TPU Pod under VM ...
When selecting pod as software version, and following instruction at Run JAX code on TPU Pod Slide jax.device_count() cannot find TPU.
Read more >
Not able to spawn V3-8 in europe-west4-a - Issue Tracker
I tried 1.15.3 <= TPU Software Version <= 2.2, still not able to spawn a TPU node, same error, `Error: Request failed with...
Read more >
Ross Wightman (@rwightman@sigmoid.social) on Twitter: "I ...
I've been bouncing back and forth btw GPU and TPU running some experiments across ... The v3-32 scales linearly in price and is...
Read more >
Hardware for Deep Learning. Part 4: ASIC - Intento
Specifically for the TFRC program, you are not charged for Cloud TPU as long as your TPU nodes run in the us-central1-f zone....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found