question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`google.auth.exceptions.RefreshError` with excessive concurrent requests.

See original GitHub issue

gcsfs propagates an google.auth.exceptions.RefreshError when executing many concurrent requests from a single node using the google_default credentials class. This is likely due to repeated, excessive number of requests to the internal metadata service. This is a known bug of the external library at GoogleCloudPlatform/google-auth-library-python#211.

Anecdotally, I’ve primarily observed this in dask.distributed workers and believe this might occur due to the way GCSFiles are distributed. This primarily occurs when a large number of small files are being read from storage and many worker threads are performing concurrent reads. I believe the GCSFiles serialized in dask tasks then each instantiate a separate GCSFilesystem, resolve credentials and open a session.

If this is the case it would be preferable to store a fixed set of AuthenticatedSession handles, ideally via cache on the GCSFilesystem class, and dispatch to an auth-method-specific session in the GCSFilesystem._connect_* connection functions.

As a more specific solution, google.auth.exceptions.RefreshError or its base class should be added to the retrying exception list in _call, however this may mask legitimate authentication errors. The credentials should probably be “tested” via some call that does not retry this error during session initialization. This may be as simple as calling session.credentials.refresh or performing a single authenticated request after session initialization.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, Oct 21, 2019

gcsfs is used routinely with Dask, but does not guarantee thread-safety. Specifically, if you have the same set of parameters when instantiating (which would be true for your example), you only create one instance and share it, so only one auth request is sent. However, the underlying library requests is almost, but not entirely thread-safe: apparently it is possible for connections to be dropped if a pool fills up; but this case would seem very unlikely in this kind of use (and should be covered by internal retries).

Directory listings could also potentially fall out of sync, but the code aggressively purges the cache when writing, and in the dask scenario, listings are usually done just once in the client.

0reactions
skeller88commented, Oct 21, 2019

Is gcsfs thread-safe? A dask worker could be running multiple threads. For example:

fs = gcsfs.GCSFileSystem(project='project_name')
def read_from_gcs(filename):
    r = fs.cat(filename)
    return imageio.core.asarray(imageio.imread(r, 'TIFF'))

delayed_read = dask.delayed(read_from_gcs, pure=True)
Read more comments on GitHub >

github_iconTop Results From Across the Web

google.auth.exceptions.RefreshError with excessive ... - GitHub
auth.exceptions.RefreshError when executing many concurrent requests from a single node using the google_default credentials class. This is ...
Read more >
Getting a google.auth.exceptions.RefreshError when trying to ...
I am trying to read a cloud storage file into a Pandas dataframe locally and then load it into a Big Query table...
Read more >
Common Errors - Ads API - Google Developers
exceptions. RefreshError. Token has been expired or revoked. A Google Cloud Platform project with an OAuth consent screen configured for an external user...
Read more >
google.auth.exceptions module - Read the Docs
Bases: google.auth.exceptions.GoogleAuthError. Used to indicate an error occurred during an HTTP request. with_traceback ()¶. Exception.with_traceback(tb) ...
Read more >
Google OAuth “invalid_grant” nightmare — and how to fix it
At Timekit, we use the Google Calendar API extensively. ... URI used in the authorization request, or was issued to another client.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found