Duplicate entry '[Experiment Name]' for key 'name' in Katib DB
See original GitHub issue/kind bug
What steps did you take and what happened: [A clear and concise description of what the bug is.]
First Step: We cleared all existing experiments. Katib DB - experiments is empty.
show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| katib |
| mysql |
| performance_schema |
| sys |
+--------------------+
5 rows in set (0.00 sec)
mysql> use katib;
Database changed
mysql> show tables;
+--------------------------+
| Tables_in_katib |
+--------------------------+
| experiments |
| extra_algorithm_settings |
| observation_logs |
| trials |
+--------------------------+
describe experiments;
+------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | UNI | NULL | |
| parameters | text | YES | | NULL | |
| objective | text | YES | | NULL | |
| algorithm | text | YES | | NULL | |
| trial_template | text | YES | | NULL | |
| metrics_collector_spec | text | YES | | NULL | |
| parallel_trial_count | int(11) | YES | | NULL | |
| max_trial_count | int(11) | YES | | NULL | |
| status | tinyint(4) | YES | | NULL | |
| start_time | datetime(6) | YES | | NULL | |
| completion_time | datetime(6) | YES | | NULL | |
| nas_config | text | YES | | NULL | |
+------------------------+--------------+------+-----+---------+----------------+
13 rows in set (0.00 sec)
mysql> SELECT * FROM experiments;
Empty set (0.00 sec)
Second Step: Launched a new experiment.
kubectl get experiment
NAME STATUS AGE
katib-mnist-with-summaries-from-hdfs-3 Running 6m16s
One new record in the DB:
SELECT * FROM experiments;
+----+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------+----------------------------+----------------+------------------------+----------------------+-----------------+--------+----------------------------+----------------------------+------------+
| id | name | parameters | objective | algorithm | trial_template | metrics_collector_spec | parallel_trial_count | max_trial_count | status | start_time | completion_time | nas_config |
+----+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------+----------------------------+----------------+------------------------+----------------------+-----------------+--------+----------------------------+----------------------------+------------+
| 25 | katib-mnist-with-summaries-from-hdfs-3 | {"parameters":[{"name":"--learning_rate","parameterType":"DOUBLE","feasibleSpace":{"max":"0.05","min":"0.01"}},{"name":"--batch_size","parameterType":"INT","feasibleSpace":{"max":"200","min":"100"}}]} | {"type":"MAXIMIZE","goal":0.99,"objectiveMetricName":"accuracy_1"} | {"algorithmName":"random"} | | | 3 | 0 | 1 | 2019-08-01 05:17:08.000000 | 0001-01-01 00:00:00.000000 | |
+----+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------+----------------------------+----------------+------------------------+----------------------+-----------------+--------+----------------------------+----------------------------+------------+
1 row in set (0.00 sec)
In the Katib controller, we will see the error log Duplicate entry 'katib-mnist-with-summaries-from-hdfs-3' for key 'name'
:
{"level":"error","ts":1564637349.4766934,"logger":"experiment-controller","caller":"experiment/experiment_controller.go:250","msg":"Create experiment in DB error","Experiment":"ml-algorithms/katib-mnist-with-summaries-from-hdfs-3","error":"rpc error: code = Unknown desc = Error 1062: Duplicate entry 'katib-mnist-with-summaries-from-hdfs-3' for key 'name'","stacktrace":"github.com/kubeflow/katib/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubeflow/katib/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/kubeflow/katib/pkg/controller/v1alpha2/experiment.(*ReconcileExperiment).Reconcile\n\t/go/src/github.com/kubeflow/katib/pkg/controller/v1alpha2/experiment/experiment_controller.go:250\ngithub.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:207\ngithub.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157\ngithub.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
Question
Why do we want to insert duplicate experiment names to the DB? What is the design here?
The name
in the DB schema is marked by unique
.
In a previous discussion, @johnugeorge mentioned that Trials are not completed until trial metrics are persisted in DB. There is a metric collector cronjob per trial that spawns every 1 minute which collects the metrics and write into DB.
Anything we should change from our side to make the DB work? Is the data fed into it wrong or the table schema is wrong?
What did you expect to happen:
No duplicate entries into the Katib DB.
Maybe use the Trail
as name
instead of Experiment
? I’m not sure about the design of Katib is here.
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment: Kubeflow version: 0.5 Minikube version: N/A, own cluster Kubernetes version: (use kubectl version): kubectl version Client Version: version.Info{Major:“1”, Minor:“15”, GitVersion:“v1.15.0”, GitCommit:“e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529”, GitTreeState:“clean”, BuildDate:“2019-06-19T16:40:16Z”, GoVersion:“go1.12.5”, Compiler:“gc”, Platform:“darwin/amd64”} Server Version: version.Info{Major:“1”, Minor:“14”, GitVersion:“v1.14.0”, GitCommit:“641856db18352033a0d96dbc99153fa3b27298e5”, GitTreeState:“clean”, BuildDate:“2019-03-25T15:45:25Z”, GoVersion:“go1.12.1”, Compiler:“gc”, Platform:“linux/amd64”} OS (e.g. from /etc/os-release): rhel6
Issue Analytics
- State:
- Created 4 years ago
- Comments:15 (8 by maintainers)
Top GitHub Comments
Can you provide me a dump of controller logs?
@gaocegege: Closing this issue.
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.