AzureVMCluster constructor just hangs after creating the scheduler.
See original GitHub issueWhat happened:
After creating a new cluster with the AzureVMCluster constructor, the run just hangs after creating the scheduler.
In the Azure Portal one can see the scheduler after it is created, it runs for a couple of minutes then it is stopped, presumably an indication that something went wrong, but the run does not fail, it just hangs
What you expected to happen: The cluster to be created with the number of workers specified
Minimal Complete Verifiable Example: In a new Conda environment
pip install dask-cloudprovider[azure] az login
from dask_cloudprovider.azure import AzureVMCluster
resource_group = "NGC-AML-Quick-Launch"
workspace_name = "NGC_AML_Quick_Launch_WS"
vnet="NGC-AML-Quick-Launch-vnet"
security_group="NGC-AML-Quick-Launch-nsg"
initial_node_count = 2
vm_size = "Standard_NC6s_v3"
location = "South Central US"
base_dockerfile = "rapidsai/rapidsai-core:cuda10.2-runtime-ubuntu18.04-py3.8"
base_dockerfile = "rapidsai/rapidsai-core-dev-nightly:0.18-cuda10.2-devel-ubuntu18.04-py3.8"
env_vars = {"EXTRA_CONDA_PACKAGES":"pywin32","EXTRA_PIP_PACKAGES": "dask-cloudprovider[azure] dask-cloudprovider[azure] --upgrade gcsfs dask_xgboost azureml"}
env_vars = {"EXTRA_PIP_PACKAGES": "dask-cloudprovider[azure]"}
cluster = AzureVMCluster(
resource_group=resource_group,
location = location,
vnet=vnet,
security_group=security_group,
n_workers=initial_node_count,
vm_size=vm_size,
docker_image=base_dockerfile,
docker_args="--privileged",
security=False,
env_vars=env_vars,
worker_class="dask_cuda.CUDAWorker")
Anything else we need to know?:
VM dask-7984db15-scheduler is created and can be seen on the Azure Portal, it runs for a few minutes then it is closed, but the run never crashes it just hangs
Environment:
- Dask version: 2021.02.0
- Python version: 3.8
- Operating System: windows
- Install method (conda, pip, source): pip
Issue Analytics
- State:
- Created 3 years ago
- Comments:30 (16 by maintainers)
Top Results From Across the Web
Deployment gets Stuck when deploying VM from custom Image
Hey, I recently uploaded a .vhd file to my azure storage account and created custom image from it. Image creation finished with no...
Read more >Create and manage a compute instance - Azure Machine ...
You can create a schedule that creates the compute instance in a stopped state. Stopped compute instances are useful when you create a...
Read more >Find out when your virtual machine hardware is degraded with ...
Azure continuously monitors for hardware that shows signs of degradation or potential failure. When these conditions are detected, Azure will ...
Read more >VM restarting or resizing issues in Azure - Virtual Machines
Navigate to the VM that's stuck in the failed state. Under Help, select Redeploy + reapply. Select the Reapply option. Next steps. If...
Read more >Reacting to maintenance events... before they happen
In order to trigger and test your logic dealing with scheduled events on your VM, simply go to the Azure portal and either...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
That file will exist on the Dask nodes, not the Jupyter Lab instance.
what was your extra_pip in this case?