AzureVMCluster doesn't correctly load config
See original GitHub issueWhat happened:
When I defined a default set of arguments for AzureVMCluster
in ~/.config/dask/cloudprovider.yaml
and tried to create a cluster instance with no configuration arguments, I received an error.
What you expected to happen:
I would have expected to get an AzureVMCluster
instance matching the config specified in the yaml file.
Minimal Complete Verifiable Example:
I set the following config in ~/.config/dask/cloudprovider.yaml
:
cloudprovider:
azure:
location: "westeurope" # The Azure location to launch your cluster
resource_group: "HelloML" # The Azure resource group for the cluster
azurevm:
vnet: "dask-cp-vnet" # Azure Virtual Network to launch VMs in
security_group: "dask-cp-nsg" # Network security group to allow 8786 and 8787
vm_size: "Standard_D2_v3" # Azure VM size to use for scheduler and workers
This should be sufficient & valid config to create an AzureVMCluster
instance, as I have constructed other clusters using the same config but passed as kwargs to the constructor.
However, when I try to create a cluster without specifying this config again, for example:
avmc = AzureVMCluster(n_workers=1)
I get the following error:
---------------------------------------------------------------------------
ConfigError Traceback (most recent call last)
<ipython-input-2-558400e25ae3> in <module>
----> 1 avmc = AzureVMCluster(n_workers=1)
/anaconda/envs/tiledb-dev/lib/python3.8/site-packages/dask_cloudprovider/azure/azurevm.py in __init__(self, location, resource_group, vnet, security_group, public_ingress, vm_size, scheduler_vm_size, vm_image, disk_size, bootstrap, auto_shutdown, docker_image, debug, marketplace_plan, **kwargs)
477 )
478 if self.resource_group is None:
--> 479 raise ConfigError("You must configure a resource_group")
480 self.public_ingress = (
481 public_ingress
ConfigError: You must configure a resource_group
Anything else we need to know?:
Looking at AzureVMCluster.__init__
there seem to be a couple of different approaches taken to get config items:
dask.config.get("cloudprovider.azure.location")
(L469)self.config.get("resource_group")
(L476 etc.)
Changing L476 to follow the pattern of L469 (that is, so that it reads dask.config.get("cloudprovider.azure.resource_group")
) fixed the problem for me, which suprised me a little bit as I might have expected to need to change all the config getters to follow the pattern of L469. I assume that the difference is due to the levels within the yaml file that provide each config item, with items at the cloudprovider.azure
level needing to be retrieved from dask.config
, but items within the sub-level cloudprovider.azure.azurevm
are automatically being loaded as part of the AzureVMCluster
instantiation?
I’m happy to raise this change as a PR, but I wanted to check first that my assumption above was correct! For example, assumedly another route to fixing this would be to change where the resource_group
item is defined within the yaml file, but I don’t know if that would have other unexpected impacts.
Environment:
- Dask version: 2021.06.2
- Python version: 3.8
- Operating System: AzureML Linux (Ubuntu 18.10 I think)
- Install method (conda, pip, source): conda
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
Top GitHub Comments
Thanks for the input @jacobtomlinson! I’ll have a look at raising that PR - I’ll also take a look at the definition of
self.config
in case it does turn out to be a simple change…I can only come up with one pattern for using a partial with config, and it doesn’t work…
because:
I don’t use partials very much! Am I missing a more obvious pattern? I’d typically go about this by using a class, such as:
Then: