question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support GCS files without credentials

See original GitHub issue

Problem description

Be able to read public GCS files without providing credentials.

Steps/code to reproduce the problem

path = "gs://tensorflow-nightly/prod/tensorflow/release/ubuntu_16/gpu_py37_full/nightly_release/18/20190813-010608/github/tensorflow/pip_pkg/tf_nightly_gpu-1.15.0.dev20190813-cp37-cp37m-linux_x86_64.whl"

import smart_open
try:
    f = smart_open.smart_open(path)
except Exception as e:
    print(e)


import tensorflow as tf
f = tf.io.gfile.GFile(path, "rb")
with open("out.whl", "wb") as fout:
    fout.write(f.read())

Running the above code, smart_open failed with

Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started

while tf.io is able to successfully download the public file, although with a warning:

W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Failed precondition: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".

Since it’s possible to download the file, it’s best to not require a credential so that public files can be easily downloaded by anyone.

Versions

Linux-5.4.63-1-lts-x86_64-with-glibc2.2.5 Python 3.8.5 (default, Sep 17 2020, 00:56:56) smart_open 2.1.1

Checklist

Before you create the issue, please make sure you have:

  • Described the problem clearly
  • Provided a minimal reproducible example, including any required data
  • Provided the version numbers of the relevant software

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
mpenkovcommented, Sep 18, 2020

@piskvorky We already have a how-to guide explicitly for capturing edge cases like this.

https://github.com/RaRe-Technologies/smart_open/blob/develop/howto.md

@petedannemann I agree, let’s deal with this in documentation for now.

@ppwwyyxx Please feel free to add to that guide using a PR.

0reactions
petedannemanncommented, Sep 18, 2020

Different behavior than the google.cloud.storage API

That’s reasonable. However I thought the exact goal of this project is to provide simpler and more unified (in other words, less backend-specific) APIs. So this argument doesn’t seem very compelling to me.

But I’ll leave that to maintainers who know more about what’s best for the project.

My understanding is that the goal of this project was to provide a unified API for file like objects . I thought handling authentication to the “file systems” to access these file like objects was expected to be so different from system to system that smart_open defers to the underlying Python package’s for each file system for authentication. That is why our transport_params kwarg exists. I defer to the maintainers of this project on this topic though.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Upload a file without authentication | Cloud Storage
Upload a file without authentication. ... The new ID for your GCS file ... you can make requests without credentials. const [location] =...
Read more >
Google Cloud Storage access without providing credentials?
Yet I was able to retrieve file without supplying any service account keys or authentication tokens from a local server using NodeJS.
Read more >
Google Cloud Storage — django-storages 1.12.2 documentation
In most cases, the default service accounts are not sufficient to read/write and sign files in GCS, so you will need to create...
Read more >
Google Cloud Storage — Dataiku DSS 11 documentation
“files” with names containing / are not supported ... create an OAuth2 client in your GCP project and configure the credentials in your...
Read more >
Working with Cloud Storage (S3, GCS) - Apache Arrow
On Linux when installing from source, S3 and GCS support is not always ... Define them in a ~/.aws/credentials file, according to the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found