KeyError: 'Data point X is not unique'
See original GitHub issueHi and congrats for this sweet piece of software, really love it 😃
I’ve implemented an async solution very similar to the one in the example notebook, with 13 virtual machines connecting to a master server hosting the Tornado webserver (like in the notebook), and it seems like I’m now constantly running over the same symptoms on the master server (the server registering the tested points and their respective target in the async example):
Error: 'Data point [3.47653914e+04 2.10539196e+02 3.15656650e+00 6.77134492e+00 1.01962491e+01] is not unique'
Traceback (most recent call last):
File "playground_master.py", line 72, in post
self._bo.register(params=body["params"], target=body["target"])
File "/usr/local/lib/python3.5/dist-packages/bayes_opt/bayesian_optimization.py", line 104, in register
self._space.register(params, target)
File "/usr/local/lib/python3.5/dist-packages/bayes_opt/target_space.py", line 161, in register
raise KeyError('Data point {} is not unique'.format(x))
KeyError: 'Data point [3.47653914e+04 2.10539196e+02 3.15656650e+00 6.77134492e+00 1.01962491e+01] is not unique'
It seems like the BO is suggesting to the 13 VMs to test points (set of 5 values in my case) which are already tested, almost in a constant manner after ~ 800 points tested.
Here’s my troubleshooting so far:
- Using the load/save examples in the front page of BO, I save all points tested in a JSON file and for this instance of the problem, I see 817 points already registered in the JSON
- It seems like this data point thrown in the traceback is indeed ALREADY in the JSON, with an associated target value
- The 13 slave VMs are still trying to ask for “suggestions” (further set of points to test), but it seems like BO gives set of points already tested ; I still see some rare cases when it’s not tested yet and the counts case increase slightly to 818, 819 points… (but most of the time the traceback is thrown)
I’m a little bit surprised you can end up with such scenario given my PBOUND is very broad, and so, has a lot to points to be worked on without having to test the same ones again:
{'learning_timesteps': (5000, 40000),
'timesteps_per_batch': (4, 72),
'observation_loopback': (1, 20),
'mmr__timeperiod': (7, 365),
'model': (-0.49, 5.49) }
This is how I initialized the utility, which, as far as I understood, is responsible for suggesting points that were already tested:
_uf = UtilityFunction(kind="ucb", kappa=2.576, xi=0)
Should I modify the acquisition function? or some of the hyperparameters “kappa”, “xi” ?
I see https://github.com/fmfn/BayesianOptimization/issues/92 related to this but I’m not doing any manual point probing / not doing any initialization, I really sticked to the Async notebook example, so I’m not sure this issue applies to me 😦
Let me know if I can share further information & more context to this Thanks in advance for the help 😃 Lucas
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (2 by maintainers)
Top GitHub Comments
It seems this could be part of a larger feature related to a “termination” condition - AFAIK, the current code only runs for a specified number of iterations, it does not have a “convergence criteria”. The error arises from the
register
function finding an identical point, so when the optimizer gets “stuck” here I think the same point will be repeatedly suggested.After suggesting a duplicate point, the point is not registered (the
try
statement fails), and it will suggest the same point on the next iteration (the posterior hasn’t changed). So, it may be possible to detect immediately (i.e., after only a single duplicate suggestion). It isn’t really an error, it is the optimizer getting stuck at a point where the utility function is higher than any other unprobed location.This tends to happen with highly exploitative values of
kappa
orxi
, so we could also suggest a higher value in the error message if this occurs.fixed in #372