question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GCS FileSystem Registration Error on TPUs

See original GitHub issue

Bug report for Colab: http://colab.research.google.com/.

For questions about colab usage, please use stackoverflow.

  • Describe the current behavior:

Since this morning, doing

from google.colab import auth
auth.authenticate_user()

with the accelerator set as TPU is giving the following error

InternalError: From /job:tpu_worker/replica:0/task:0:
The filesystem registered under the 'gs://' scheme was not a tensorflow::RetryingGcsFileSystem*.
	 [[node GcsConfigureCredentials (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'GcsConfigureCredentials':
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 664, in launch_instance
    app.start()
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "/usr/local/lib/python3.6/dist-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2828, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-28843b2fd02c>", line 1, in <module>
    auth.authenticate_user()
  File "/usr/local/lib/python3.6/dist-packages/google/colab/auth.py", line 157, in authenticate_user
    sess, credentials=_json.load(auth_info))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/cloud/python/ops/gcs_config_ops.py", line 182, in configure_gcs
    return configure(credentials, block_cache)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/cloud/python/ops/gcs_config_ops.py", line 170, in configure
    op = gen_gcs_config_ops.gcs_configure_credentials(placeholder)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/cloud/python/ops/gen_gcs_config_ops.py", line 188, in gcs_configure_credentials
    "GcsConfigureCredentials", json=json, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

This causes the training program to break when reading a dataset stored in GCS, with the following error

Error executing an HTTP request: HTTP response code 403 with body '{
  "error": {
    "code": 403,
    "message": "service-495559152420@cloud-tpu.iam.gserviceaccount.com does not have storage.objects.get access to <...>.tfrecord-00000-of-00001.",
    "errors": [
      {
        "message": "service-495559152420@cloud-tpu.iam.gserviceaccount.com does not have storage.objects.get access to <...>.tfrecord-00000-of-00001.",
        "domain": "global",
'
	 when reading metadata of gs://<...>
	 [[{{node MultiDeviceIteratorGetNextFromShard}}]]
	 [[RemoteCall]]
	 [[IteratorGetNextAsOptional]]
  • Describe the expected behavior:

Authentication should work, works for other accelerators (None, GPU). Only facing problems in TPU.

  • The web browser you are using (Chrome, Firefox, Safari, etc.):

Firefox

  • Link to self-contained notebook that reproduces this issue (click the Share button, then Get Shareable Link):

https://colab.research.google.com/drive/1hhy_cgjyz56lCCBfdYhr7k_6dU-zJaTK

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:5
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
chris-clemcommented, Feb 29, 2020

Hi, I am having the issue again. With GPU, everything works fine, with TPU, I get the error.

2reactions
colaboratory-teamcommented, Oct 16, 2019

Issue is indeed resolved for new VM assignments. If you’re still seeing the issue, please use “Reset all runtimes” from the Runtime menu or the command-palette (cmd/ctrl-shift-P) to get a new VM (note you’ll lose any existing VM assignments in all notebooks so save your work first).

Read more comments on GitHub >

github_iconTop Results From Across the Web

File system scheme '[local]' not implemented in Google Colab ...
Cloud TPUs can only access data in GCS as only the GCS file system is registered.
Read more >
Troubleshoot file system transfers - Google Cloud
Error message Error type What the error means Failed due to invalid file name INVALID_FILE_NAME The path of a source file is invalid. Failed due...
Read more >
Flower Classification with TPUs | Kaggle
Use TPUs to classify 104 types of flowers. ... Getting this error when trying to iterate on dataset ... IS there any way...
Read more >
tf.io.gfile.GFile | TensorFlow v2.11.0
The C++ FileSystem API supports multiple file system implementations, ... you can use the regular Python file API without any problem.
Read more >
PyTorch/XLA master documentation
PyTorch runs on XLA devices, like TPUs, with the torch_xla package. This document describes how to run your ... will throw an error...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found