question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Configuration File

See original GitHub issue

Continuation of https://github.com/dask/distributed/issues/58

I think it’s now time to have a configuration file. There are a few options that may be nicer to manage on a per-machine basis rather than in various command line options (though these will remain dominant) and hard coded settings.

Here are a few:

  1. Logging levels for dask
  2. Logging levels for the bokeh web application
  3. Compression
  4. Ports for the scheduler, json, web interface, etc…
  5. Whitelisted ports for bokeh (though this is now open by default)
  6. Whether or not to use PDB when an error occurs (I use this for debugging)

Some open questions:

  1. Where do we put this file? I’m thinking ~/.dask/config
  2. What format do we use, JSON, YAML, TOML, INI?
  3. Are there other options that people find themselves often setting that we would want to include? We could also just include all options available through the CLI
  4. Desired nesting level? For example
'scheduler': {'port': 8786,
              'bokeh': 8787}, 
...

vs

'scheduler-port': 8786,
'scheduler-bokeh': 8787,
...

@quasiben I would value your feedback in particular here.

I don’t have much scar tissue on this topic.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:1
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
kszucscommented, Sep 9, 2016

Personally I favor environment variables over any configuration file. In our distributed (docker containers on top of mesos, marathon, chronos) setup the common practice is also env variables, distributing files is way more problematic (needs shared storage like HDFS/S3). Click also has built-in support for reading options from env.

In our workflow manager a click cli script submits the computation as a chronos or marathon (meta schedulers on top of mesos) task, which starts a mesos (dask.mesos) framework, which schedules multiple tasks across the cluster. All of these tasks can start for example a local dask computation, a distributed spark job, another mesos framework, a data migration tool etc. The workflow manager needs to forward/ship the configuration down to the leaves (for example a cassandra host:port).

Personally I use dask.context._globals for this purpose. IMHO that would be a better container to store and ship config values (read from cli and environment variables), especially because I can temporarily override the values with set_options.

Auto-shipping can be solved via a custom pickler:

def inject_addons(self):
    self.save_reduce(lambda opts: set_options(**opts), (_globals,))

# register reducer to auto pickle _globals configuration
CloudPickler.inject_addons = inject_addons
0reactions
mrocklincommented, Jan 14, 2019

There generally is no centralization documentation for these except for the files themselves, which should auto-populate into your ~/.config/dask directory the first time you import any dask sub-project. For the dask-distributed project in particular you can look at https://github.com/dask/distributed/blob/master/distributed/distributed.yaml

On Mon, Jan 14, 2019 at 12:46 PM Scott Brown notifications@github.com wrote:

Is there documentation for the possible options in a yaml configuration? I can’t seem to locate such a document, and instead find small examples here and there of possible configuration subsets. Where are all possible options documented?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/463#issuecomment-454155170, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszEJqhiGfy-Z90Z4VrkSJ6JK-y-k6ks5vDOyhgaJpZM4Juf34 .

Read more comments on GitHub >

github_iconTop Results From Across the Web

Configuration file - Wikipedia
In computing, configuration files are files used to configure the parameters and initial settings for some computer programs.
Read more >
What is configuration file? | Definition from TechTarget
A configuration file, often shortened to config file, defines the parameters, options, settings and preferences applied to operating systems (OSes), ...
Read more >
What is a config file? - Opensource.com
Configuration files ("config files" for short) are important to modern computing. They allow you to customize how you interact with an ...
Read more >
What is a Configuration File (Config File)? - Techopedia
In computer science, configuration files provide the parameters and initial settings for the operating system and some computer applications ...
Read more >
CONFIG - Configuration File - File Format Docs
A CONFIG file is known as configuration file; used to configure the parameters and primary settings for several computer softwares. Some softwares only...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found