question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using GCP service account scopes in self hosted runner

See original GitHub issue

I’m attempting to mount a GCS bucket in a self-hosted runner from CML and encountering multiple authentication problems with gcsfuse.

We are using this definition for our cml runner:

cml runner \
              --cloud=gcp \
              --cloud-region=us-west \
              --cloud-type=m+k80 \
              --labels=cml-gpu \
              --cloud-permission-set=cmldeploy@bp-padang.iam.gserviceaccount.com

We then mount buckets in our project using gcsfuse:

gcsfuse --debug_gcs --implicit-dirs data/

And this returns the following error:

2022/02/15 19:03:17.273266 Start gcsfuse/0.40.0 (Go version go1.17.6) for app "" using mount point: /__w/ml-project-seed/ml-project-seed/data
2022/02/15 19:03:17.287982 Opening GCS connection...
2022/02/15 19:03:17.291621 Mounting file system "gcsfuse"...
2022/02/15 19:03:17.293180 File system has been successfully mounted.
Here are the contents of the mounted path
$ cd data/bp-padang/cloudcover
/__w/_temp/0c805963-6ad9-44af-931d-9f971accd261.sh: 24: cd: can't cd to data/bp-padang/cloudcover

The service account cmldeploy@bp-padang.iam.gserviceaccount.com has been assigned Storage Admin and Compute Admin roles, so theoretically it should have access to the buckets.

After multiple trial and errors, we were able to setup an instance via terraform and successfully mount the buckets with gcsfuse by using these settings:

resource "google_compute_instance" "jupyter" {
  ....
  service_account { scopes = ["storage-full", "cloud-platform"] }
  ...
}

Looks like the scopes are quite important in order to provide instances with permissions in GCS resources. It would be great if we could set those along with other parameters in the cml runner command.

If you have any other experiences mounting GCS buckets in CML based runners, would be happy to hear how you accomplished it without the access scopes. Any help would be really appreciated!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

5reactions
dacbdcommented, Feb 17, 2022

Thanks @DavidGOrtega, and I can confirm that I had a whole pipeline work successfully with:

          cml-runner \
            ...
            --cloud=gcp \
            --cloud-permission-set=dvc-object-storage@xxx.iam.gserviceaccount.com,scopes=storage-rw \
            ...

And DVC on the instance automagically had correct permission to the remote bucket 🥳 🎈

2reactions
rodrigoalmeida94commented, Feb 18, 2022

I can also confirm I was able to run a workflow successfully when mounting a bucket using gcsfuse! Thanks so much everyone for the really fast turn around. 🐎 🥳

Read more comments on GitHub >

github_iconTop Results From Across the Web

Authenticate workloads using service accounts - Google Cloud
This page describes how to use service accounts to enable apps running on your virtual machine (VM) instances to authenticate to Google Cloud...
Read more >
Monitoring and troubleshooting self-hosted runners
You can monitor your self-hosted runners to view their activity and diagnose common issues.
Read more >
Setting up the Service Account and enable the APIs ... - CloudM
The Powershell script will run, setting up the service account and enabling the scopes. · Enter a project name. · Click on the...
Read more >
GitHub Actions self-hosted runners on Google Cloud
You can also take advantage of Application Default Credentials to automatically authenticate with Google Cloud APIs using the service account ...
Read more >
Understanding Google Cloud Storage Scopes - John Hanley
Google Cloud Storage uses scopes to determine what permissions an identity has on a specified resource. Google scopes are formatted as urls.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found