Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[tune] Unexpected num_samples and trial execution (setting of self.config) behavior

See original GitHub issue

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Kubuntu 18.04
Ray installed from (source or binary): Installed binary via pip install
Ray version: 0.7.6
Python version: 3.6.8
Exact command to reproduce:

Describe the problem

In reference to the example ray/tune/examples/hyperband_example.py In that example, it sets num_samples=20

If I instead set the Experiment num_samples=1, then:

1 MyTrainableClass instance is created
_setup() is called once after self.config is set to a randomly generated config dict in that one MyTrainableClass instance
_train() is called on that one instance many times, without any change to self.config in that one MyTrainableClass instance

If I instead set the Experiment num_samples=2, then:

2 MyTrainableClass instance are created, and for each instance:
_setup() is called once after self.config is set to a randomly generated config dict in the MyTrainableClass instance
_train() is called many times without any change to self.config in each MyTrainableClass instance _train() is called many times for the 2 instances during the experiment run

However, from: https://ray.readthedocs.io/en/latest/tune-usage.html it says: “E.g. in the above, num_samples=10 repeats the 3x3 grid search 10 times, for a total of 90 trials, each with randomly sampled values of alpha and beta”

but in my testing, the number of distinctly configured trials (i.e. distinctly different self.config dicts) is limited to num_samples. The issue I am having is that in my own MyTrainableClass class, for a given self.config dict of values, my _train() will always return exactly the same result, so there is no point in calling it more than once for the same self.config. I am expecting self.config will be different for each time _train() ends up getting called (presumably exactly once for each trial / MyTrainableClass class instance created).

My experiment setup (for XGBoost) looks like this:

hyperBandScheduler = HyperBandScheduler(
    time_attr = "training_iteration",
      metric = "episode_reward_mean",
    metric = "mean_accuracy",
    # Want to maximize mean_accuracy
    mode = "max",
    max_t = 100)

exp = Experiment(
	resources_per_trial = {"cpu": 1, "gpu": 0},
	stop={"training_iteration": 99999},
	name = "xgb_auc_optimizer",
	run = MyTrainableClass,
	config={
		"eval_metric": 'auc',
		"booster" : sample_from(lambda spec: choice(["dart", "gbtree", "gblinear"])),
		"max_depth" : sample_from(lambda spec: randint(1, 9)),
		"eta" : sample_from(lambda spec: loguniform(1e-4, 1e-1)),
		"gamma" : sample_from(lambda spec: loguniform(1e-8, 1.0)),
		"grow_policy" : sample_from(lambda spec: choice(['depthwise', 'lossguide'])),
	},
	)


tune.run(exp,
		scheduler = hyperBandScheduler,
		verbose=0,
		)

And _train() returns a dict containing a “mean_accuracy” value.

From my reading of the documentation, I thought that num_samples should be left at the default of 1, and the hyperparameter search iteration mechanism would create up to max_t=100 different MyTrainableClass instances - each with their own unique self.config - and each one having _setup() and then _train() called only once, with the overall goal of searching for an optimal (max) ‘mean_accuracy’.

I am not sure if what I have described is showing up an issue with ray/tune, or if I am not using it as intended. Questions:

Should I be using grid_search() rather than sample_from() for any of the search parameters in my config ?
Can anyone suggest either: a) What I need to change to be able to run an effective hyperparameter search for my MyTrainableClass class ? OR b) If what I have explained is showing up a bug in ray/tune ?

Thanks

Source code / logs

Issue Analytics

State:
Created 4 years ago
Comments:8 (6 by maintainers)

Top GitHub Comments

1reaction

hartikainencommented, Nov 15, 2019

x: grid_search([1, 2, 3]), y: grid_search([a, b, c]), num_samples=5 = 90 different configs.

This should be 45 different configs 😃

1reaction

richardliawcommented, Nov 12, 2019

Thanks for opening this issue @andrewv99!

Ah, I can see that being confusing:

_train should probably be better named as _step. This will be called many times for one parameter (think of it like an epoch).

If you’re only going to call _train once, you should set stop={"training_iteration": 1}.

sample_from generates one parameter everytime invoked. grid_search will make it so that all values are evaluated. num_samples will multiply the cardinality of the given specification.

i.e.,

x: grid_search([1, 2, 3, 4]), num_samples=1 = 4 different configs.
x: grid_search([1, 2, 3]), num_samples=1 = 3 different configs.
x: grid_search([1, 2, 3]), num_samples=2 = 6 different configs.
x: grid_search([1, 2, 3]), y: grid_search([a, b, c]), num_samples=1 = 9 different configs.
x: grid_search([1, 2, 3]), y: grid_search([a, b, c]), num_samples=2 = 18 different configs.
x: grid_search([1, 2, 3]), y: grid_search([a, b, c]), num_samples=5 = 90 different configs.
x: sample_from(...), y: grid_search([a, b, c]), num_samples=2 = 6 different configs.
x: sample_from(...), y: sample_from(..), num_samples=13 = 13 different configs.

Feel free to ask any question that you have. Also, if this clears up your confusion, could you provide some suggestions as to what I can do to make this clearer from the docs?