question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dask config size limitation in EC2Cluster

See original GitHub issue

It seems there is a 16kb limit on the amount of user_data that can be passed to an EC2 instance when starting up.

We serialize the local Dask config and pass it to the scheduler and workers via the user_data.

https://github.com/dask/dask-cloudprovider/blob/da454827b88c2f0f0d06af07e02d5d1580a4c366/dask_cloudprovider/generic/vmcluster.py#L33-L35

Depending on what config the user has locally this can tip us over the limit and result in the AWS API rejecting the instance creation call.

botocore.exceptions.ClientError: An error occurred (InvalidParameterValue) when calling the RunInstances operation: User data is limited to 16384 bytes

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
shireenraocommented, Sep 30, 2021

@jacobtomlinson - I see what mistake I made. You pointed me in the right direction. When I first saw this error, I found this ticket. In the same repl session, I tried passing in the security=False and saw this error. In a fresh session, if you start with security=False, this works! Your comment above about Restarting your Python process or notebook kernel will clear out any cached certs makes sense now.

Thank you!

0reactions
jacobtomlinsoncommented, Oct 18, 2022

We enabled security=True by default because other default behaviour can cause your cluster to be exposed to the internet. However, Dask is typically deployed with security=False and folks use network-level security to secure their clusters, so I’d push back against this not being a production-grade workaround. For example, on Kubernetes you would use a service like Istio to handle this at the network layer.

I totally agree though this it’s an unpleasant workaround and if there is a strong desire by the community to resolve this then I’m all for it. Do you have thoughts on a long term solution?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Amazon Web Services (AWS) - Dask Cloud Provider
By default a Dask security group will be created with ports 8786 and 8787 exposed to the internet. The instance filesystem size in...
Read more >
Dask Cloud Provider Documentation
Each cluster manager in Dask Cloudprovider will require some configuration specific to the cloud services you wish.
Read more >
Configuration - Dask documentation
The maximum size of a websocket frame to send through a comm. This is somewhat duplicative of distributed.comm.shard, but websockets often have much...
Read more >
Cluster Resource Limits - Dask Gateway
ClusterConfig.cluster_max_cores : Maximum number of cores per cluster. c.ClusterConfig.cluster_max_memory : Maximum amount of memory per cluster.
Read more >
Frequently Asked Questions - Dask.distributed
Instructions are here under the heading “User Level FD Limits”: ... The Dask Scheduler tracks the location and size of every intermediate value...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found