worker config set by config.set is not read by worker
See original GitHub issueThe configuration directly within Python is explained in the documentation here : Configuration - Directly within Python
When using dask.config.set
, I expect the worker to use those values. Instead, the worker reads the default values and does not use the values set using dask.config.set
.
I modified distributed\worker.py as below to print the values received by the worker.
if "memory_spill_fraction" in kwargs:
self.memory_spill_fraction = kwargs.pop("memory_spill_fraction")
print("self.memory_spill_fraction from kwargs = {}".format(self.memory_spill_fraction))
else:
self.memory_spill_fraction = dask.config.get(
"distributed.worker.memory.spill"
)
print("self.memory_spill_fraction from dask.config = {}".format(self.memory_spill_fraction))
import dask
import dask.dataframe as dd
from dask.distributed import Client, LocalCluster
import pandas as pd
cluster = LocalCluster()
client = Client(cluster)
new = {"distributed.worker.memory.target": 0.1,
"distributed.worker.memory.spill": 0.2,
"distributed.worker.memory.pause": 0.3}
with dask.config.set(new):
print(dask.config.get("distributed.worker.memory"))
timestamp = pd.date_range('2018-01-01', periods=4, freq='S')
col1 = pd.Series(["1", "3", "5", "7"], dtype="string")
df = pd.DataFrame({"timestamp": timestamp,"col1": col1}).set_index('timestamp')
ddf = dd.from_pandas(df, npartitions=1)
ddf.compute()
ddf.head(2)
Outputs
self.memory_spill_fraction from dask.config = 0.7
self.memory_spill_fraction from dask.config = 0.7
self.memory_spill_fraction from dask.config = 0.7
self.memory_spill_fraction from dask.config = 0.7
{'target': 0.1, 'spill': 0.2, 'pause': 0.3, 'terminate': 0.4}
Notice the 0.7 value which is the default.
Passing the configuration by kwargs works.
import dask
import dask.dataframe as dd
from dask.distributed import Client, LocalCluster
import pandas as pd
cluster = LocalCluster(
memory_target_fraction=0.1,
memory_spill_fraction=0.2,
memory_pause_fraction=0.3)
client = Client(cluster)
timestamp = pd.date_range('2018-01-01', periods=4, freq='S')
col1 = pd.Series(["1", "3", "5", "7"], dtype="string")
df = pd.DataFrame({"timestamp": timestamp,"col1": col1}).set_index('timestamp')
ddf = dd.from_pandas(df, npartitions=1)
ddf.compute()
ddf.head(2)
Outputs
self.memory_spill_fraction from kwargs = 0.2
self.memory_spill_fraction from kwargs = 0.2
self.memory_spill_fraction from kwargs = 0.2
self.memory_spill_fraction from kwargs = 0.2
Environment:
- Dask version: 2.18.1
- distributed version : 2.18.0
- Python version: 3.8.3
- Operating System: Windows
- Install method : pip
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Worker Configuration Properties | Confluent Documentation
Configures the listener used for communication between Workers. Valid values are either http or https . If the listeners property is not defined...
Read more >Configuration - Dask documentation
This utility is designed to improve understanding of converting between different notations and does not claim to be a perfect implementation. Please use...
Read more >Settings — Gunicorn 20.1.0 documentation
This is an exhaustive list of settings for Gunicorn. Some settings are only able to be set from a configuration file. The setting...
Read more >Configuration · Cloudflare Workers docs
Wrangler optionally uses a wrangler.toml configuration file to ... A map of environment variables to set when deploying your Worker.
Read more >How to setup app settings in a .Net Core 3 Worker Service
CreateDefaultBuilder will set up the usual configuration (appsettings.json et al). Use hostContext.Configuration to get the IConfiguration ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@samaust @mrocklin
FYI this caught me recently as well. I spent this morning trying to figure out why my dask-jobqueue LSFCluster workers were being killed with:
OSError: Timed out during handshake while connecting to tcp://10.36.110.11:38453 after 10 s
After setting:
from the scheduler process.
I thought the
config.set
function was broken. IMHO this should be mentioned in the documentation or even better have subprocesses inherit configuration changes from parents.Late to the party, but this work for me
Output:
10ms