question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error running katib on latest master (04/13)

See original GitHub issue

After deploying katib following getting started guide, I’ve seen the following errors:

$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                      READY     STATUS             RESTARTS   AGE
katib         dlk-manager-698ccb5fdc-hb7xc              0/1       CrashLoopBackOff   6          13m
katib         modeldb-backend-6855d95fb4-2sxw9          1/1       Running            0          14m
katib         modeldb-db-6cf5bb764-5s65f                1/1       Running            0          14m
katib         modeldb-frontend-5868bffc64-rhrr7         1/1       Running            0          14m
katib         vizier-core-86c5566c88-kvsp9              0/1       CrashLoopBackOff   6          13m
katib         vizier-db-64557596dc-mpgh4                1/1       Running            0          13m
katib         vizier-suggestion-random-6b4d6db6-m8l94   0/1       CrashLoopBackOff   6          13m
kube-system   kube-dns-5c6c5b55b-qmd9l                  3/3       Running            0          16m

I’ve managed to get it running; it turns out the command is not correct. For example, I have to change this:

    spec:
      serviceAccountName: vizier-core
      containers:
      - name: vizier-core
        image: katib/vizier-core
        args:
          - "-w"
          - "dlk"
        ports:
        - name: api
          containerPort: 6789

to

    spec:
      serviceAccountName: vizier-core
      containers:
      - name: vizier-core
        image: katib/vizier-core
        args:
          - ./vizier-manager    <-- add this line
          - "-w"
          - "dlk"
        ports:
        - name: api
          containerPort: 6789

However, based on docker file for vizier-core, vizier-manager is already set as entrypoint,

FROM golang:alpine AS build-env
# The GOPATH in the image is /go.
ADD . /go/src/github.com/kubeflow/hp-tuning
WORKDIR /go/src/github.com/kubeflow/hp-tuning/manager
RUN go build -o vizier-manager

FROM alpine:3.7
WORKDIR /app
COPY --from=build-env /go/src/github.com/kubeflow/hp-tuning/manager/vizier-manager /app/
COPY --from=build-env /go/src/github.com/kubeflow/hp-tuning/manager/visualise /
ENTRYPOINT ["./vizier-manager"]
CMD ["-w", "dlk"]

Anything wrong with the above 👆 setup?

/cc @gaocegege @YujiOshima

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
YujiOshimacommented, Apr 13, 2018

Hi @ddysher . Do you use katib/~ docker images? I’m sorry I didn’t update the images the latest version. It is not automated… I updated the images. Please retry. If you will still have a problem, show me the log of vizier-core. kubectl -n katib logs deploy/vizier-core

0reactions
ddyshercommented, Apr 16, 2018

sure

Read more comments on GitHub >

github_iconTop Results From Across the Web

Not getting in to the container - Pod terminating immediately
The problem is that katib doesnot get into the container [main training code]. The pipeline executes successfully but the best parameter JSON is ......
Read more >
Getting Started with Katib - Kubeflow
This guide shows how to get started with Katib and run a few examples using the command line and the Katib user interface...
Read more >
Hyperparameter Tuning with Katib - YouTube
Getting started with Katib → https://goo.gle/3hI9sFi How do you go about setting the number of epochs you want to train?
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found