question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Taking around 7 min to create a GPU runner

See original GitHub issue

I’m trying to create a GPU runner from the GCP compute engine. But It takes around 7 min. I could not understand whether it’s a normal behavior of CML cloud runner or not.

You can check the full github action log here - Thanks

Run cml runner \
  cml runner \
      --cloud=gcp \
      --cloud-region=us-central1-a \
      --cloud-type=n1-highmem-2 \
      --labels=voicebook-de-gpu \
      --single=true \
      --cloud-gpu=nvidia-tesla-t4 \
      --cloud-spot=false \
      --cloud-hdd-size=50
  shell: /usr/bin/bash -e ***0***
  env:
    REPO_TOKEN: ***
    GOOGLE_APPLICATION_CREDENTIALS_DATA: ***
***"level":"warn","message":"ignoring RUNNER_NAME environment variable, use CML_RUNNER_NAME or --name instead"***
***"level":"info","message":"Preparing workdir /home/runner/.cml/cml-mjk7ogqgly..."***
***"level":"info","message":"Deploying cloud runner plan..."***
***"level":"info","message":"Terraform apply..."***
***"level":"info","message":"Terraform 1.2.3"***
***"level":"info","message":"iterative_cml_runner.runner: Plan to create"***
***"level":"info","message":"Plan: 1 to add, 0 to change, 0 to destroy."***
***"level":"info","message":"iterative_cml_runner.runner: Creating..."***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [10s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [20s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [30s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [40s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [50s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [1m0s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [1m10s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [1m20s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [1m30s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [1m40s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [1m50s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [2m0s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [2m10s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [2m20s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [2m30s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [2m40s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [2m50s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [3m0s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [3m10s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [3m20s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [3m30s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [3m40s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [3m50s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [4m0s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [4m10s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [4m20s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [4m30s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [4m40s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [4m50s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [5m0s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [5m10s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [5m20s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [5m30s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [5m40s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [5m50s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [6m0s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [6m10s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [6m20s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [6m30s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [6m40s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Still creating... [6m50s elapsed]"***
***"level":"info","message":"iterative_cml_runner.runner: Creation complete after 6m53s [id=iterative-uvcj2vc6xb9]"***
***"level":"info","message":"Apply complete! Resources: 1 added, 0 changed, 0 destroyed."***
***"level":"info","message":"Outputs: 0"***
***"level":"info","message":"***\"awsSecurityGroup\":null,\"awsSubnetId\":null,\"cloud\":\"gcp\",\"driver\":\"github\",\"id\":\"iterative-uvcj2vc6xb9\",\"idleTimeout\":300,\"image\":null,\"instanceGpu\":\"nvidia-tesla-t4\",\"instanceHddSize\":50,\"instanceIp\":\"35.209.218.96\",\"instanceLaunchTime\":\"2022-07-04 03:27:45.889910336 +0000 UTC m=+39.123498[61](https://github.com/hishab-nlp/voiceboohttps://github.com/<user>/repository>/runs/7173659761?check_suite_focus=true#step:4:61)6\",\"instanceType\":\"n1-highmem-2\",\"instancePermissionSet\":null,\"labels\":\"voicebook-de-gpu\",\"cmlVersion\":\"0.[16](https://github.com/hishab-nlp/voicebook-reporting-dialogue-engine/runs/7173659761?check_suite_focus=true#step:4:17).1\",\"metadata\":null,\"name\":\"cml-mjk7ogqgly\",\"region\":\"us-central1-a\",\"repo\":\"<repo infor>",\"single\":true,\"spot\":false,\"spotPrice\":-1,\"timeouts\":null***"***

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
dacbdcommented, Jul 5, 2022

Having tested nearly the exact command a few times I think there isn’t anything obviously slowing down the process. The driver setup when using a GPU seems to be to slowest factor, around 2 additional minutes on avg. this line for my future reference

0reactions
alexpotvcommented, Aug 3, 2022

@dacbd Oh I see… In that case, I’ll keep it exactly like that on my side! Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Step-By-Step guide to Setup GPU with TensorFlow ... - Medium
Running code using Pycharm: Open pycharm and new create a project. While creating project select base interpreter as a virtual environment that ...
Read more >
Monitor and Improve GPU Usage for Training Deep Learning ...
For users training on GPUs, I looked at their average utilization across all runs. Since launch, we've tracked hundreds of thousands of runs ......
Read more >
Using Your GPU in a Docker Container - Roboflow Blog
The NVIDIA Container Toolkit is the solution to configure your GPU within a Docker container. Follow this step-by-step guide to get started.
Read more >
How much heat and stress can a GPU ACTUALLY take??
I always hear that people are afraid to overclock their graphics card because they are afraid to hurt it... so in an effort...
Read more >
Multi-Process Service :: GPU Deployment and Management ...
MPS is useful when each application process does not generate enough work to saturate the GPU. Multiple processes can be run per node...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found