question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

katib-controller in invalid memory address or nil pointer dereference

See original GitHub issue

/kind bug

What steps did you take and what happened: [A clear and concise description of what the bug is.]

  1. install the kubeflow 1.12 on kubernetes 1.16.15 by official kfctl

  2. start a notebook and run the following script mnist-pipeline.txt

  3. the katib-controller starts to be on the state of CrashLoopBack Off forever, and the following logs is found:

{“level”:“info”,“ts”:1615690131.710228,“logger”:“entrypoint”,“msg”:“Config:”,“experiment-suggestion-name”:“default”,“cert-local-filesystem”:false,“webhook-port”:8443,“metrics-addr”:“:8080”,“inject-security-context”:false,“enable-grpc-probe-in-suggestion”:true,“trial-resources”:[{“Group”:“batch”,“Version”:“v1”,“Kind”:“Job”},{“Group”:“kubeflow.org”,“Version”:“v1”,“Kind”:“TFJob”},{“Group”:“kubeflow.org”,“Version”:“v1”,“Kind”:“PyTorchJob”},{“Group”:“kubeflow.org”,“Version”:“v1”,“Kind”:“MPIJob”},{“Group”:“tekton.dev”,“Version”:“v1beta1”,“Kind”:“PipelineRun”}]} {“level”:“info”,“ts”:1615690131.8203886,“logger”:“entrypoint”,“msg”:“Registering Components.”} {“level”:“info”,“ts”:1615690131.821229,“logger”:“entrypoint”,“msg”:“Setting up controller”} {“level”:“info”,“ts”:1615690131.8212914,“logger”:“experiment-controller”,“msg”:“Using the default suggestion implementation”} {“level”:“info”,“ts”:1615690131.8215294,“logger”:“kubebuilder.controller”,“msg”:“Starting EventSource”,“controller”:“experiment-controller”,“source”:“kind source: /, Kind=”} {“level”:“info”,“ts”:1615690131.821822,“logger”:“kubebuilder.controller”,“msg”:“Starting EventSource”,“controller”:“experiment-controller”,“source”:“kind source: /, Kind=”} {“level”:“info”,“ts”:1615690131.8220415,“logger”:“kubebuilder.controller”,“msg”:“Starting EventSource”,“controller”:“experiment-controller”,“source”:“kind source: /, Kind=”} {“level”:“info”,“ts”:1615690131.8222158,“logger”:“experiment-controller”,“msg”:“Experiment controller created”} {“level”:“info”,“ts”:1615690131.8223069,“logger”:“kubebuilder.controller”,“msg”:“Starting EventSource”,“controller”:“suggestion-controller”,“source”:“kind source: /, Kind=”} {“level”:“info”,“ts”:1615690131.8223562,“logger”:“kubebuilder.controller”,“msg”:“Starting EventSource”,“controller”:“suggestion-controller”,“source”:“kind source: /, Kind=”} {“level”:“info”,“ts”:1615690131.822521,“logger”:“kubebuilder.controller”,“msg”:“Starting EventSource”,“controller”:“suggestion-controller”,“source”:“kind source: /, Kind=”} {“level”:“info”,“ts”:1615690131.8226724,“logger”:“kubebuilder.controller”,“msg”:“Starting EventSource”,“controller”:“suggestion-controller”,“source”:“kind source: /, Kind=”} {“level”:“info”,“ts”:1615690131.8228512,“logger”:“suggestion-controller”,“msg”:“Suggestion controller created”} {“level”:“info”,“ts”:1615690131.8230324,“logger”:“kubebuilder.controller”,“msg”:“Starting EventSource”,“controller”:“trial-controller”,“source”:“kind source: /, Kind=”} {“level”:“info”,“ts”:1615690131.8231058,“logger”:“kubebuilder.controller”,“msg”:“Starting EventSource”,“controller”:“trial-controller”,“source”:“kind source: batch/v1, Kind=Job”} {“level”:“info”,“ts”:1615690131.8232667,“logger”:“trial-controller”,“msg”:“Job watch added successfully”,“CRD Group”:“batch”,“CRD Version”:“v1”,“CRD Kind”:“Job”} {“level”:“info”,“ts”:1615690131.8233113,“logger”:“kubebuilder.controller”,“msg”:“Starting EventSource”,“controller”:“trial-controller”,“source”:“kind source: kubeflow.org/v1, Kind=TFJob”} {“level”:“info”,“ts”:1615690131.823487,“logger”:“trial-controller”,“msg”:“Job watch added successfully”,“CRD Group”:“kubeflow.org”,“CRD Version”:“v1”,“CRD Kind”:“TFJob”} {“level”:“info”,“ts”:1615690131.8235776,“logger”:“kubebuilder.controller”,“msg”:“Starting EventSource”,“controller”:“trial-controller”,“source”:“kind source: kubeflow.org/v1, Kind=PyTorchJob”} {“level”:“info”,“ts”:1615690131.8237944,“logger”:“trial-controller”,“msg”:“Job watch added successfully”,“CRD Group”:“kubeflow.org”,“CRD Version”:“v1”,“CRD Kind”:“PyTorchJob”} {“level”:“info”,“ts”:1615690131.823831,“logger”:“kubebuilder.controller”,“msg”:“Starting EventSource”,“controller”:“trial-controller”,“source”:“kind source: kubeflow.org/v1, Kind=MPIJob”} {“level”:“info”,“ts”:1615690131.8239534,“logger”:“trial-controller”,“msg”:“Job watch added successfully”,“CRD Group”:“kubeflow.org”,“CRD Version”:“v1”,“CRD Kind”:“MPIJob”} {“level”:“info”,“ts”:1615690131.8239853,“logger”:“kubebuilder.controller”,“msg”:“Starting EventSource”,“controller”:“trial-controller”,“source”:“kind source: tekton.dev/v1beta1, Kind=PipelineRun”} {“level”:“error”,“ts”:1615690131.8240302,“logger”:“kubebuilder.source”,“msg”:“if kind is a CRD, it should be installed before calling Start”,“kind”:{“Group”:“tekton.dev”,“Kind”:“PipelineRun”},“error”:“no matches for kind "PipelineRun" in version "tekton.dev/v1beta1"”,“stacktrace”:“github.com/kubeflow/katib/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubeflow/katib/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start\n\t/go/src/github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/source/source.go:89\ngithub.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Watch\n\t/go/src/github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122\ngithub.com/kubeflow/katib/pkg/controller.v1beta1/trial.add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/trial/trial_controller.go:106\ngithub.com/kubeflow/katib/pkg/controller.v1beta1/trial.Add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/trial/trial_controller.go:65\ngithub.com/kubeflow/katib/pkg/controller%2ev1beta1.AddToManager\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/controller.go:28\nmain.main\n\t/go/src/github.com/kubeflow/katib/cmd/katib-controller/v1beta1/main.go:112\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204”} {“level”:“info”,“ts”:1615690131.8242824,“logger”:“trial-controller”,“msg”:“Job watch error. CRD might be missing. Please install CRD and restart katib-controller”,“CRD Group”:“tekton.dev”,“CRD Version”:“v1beta1”,“CRD Kind”:“PipelineRun”} {“level”:“info”,“ts”:1615690131.8243027,“logger”:“trial-controller”,“msg”:“Trial controller created”} {“level”:“info”,“ts”:1615690131.8243096,“logger”:“entrypoint”,“msg”:“Setting up webhooks”} {“level”:“info”,“ts”:1615690131.8245256,“logger”:“entrypoint”,“msg”:“Starting the Cmd.”} {“level”:“info”,“ts”:1615690131.9251847,“logger”:“kubebuilder.controller”,“msg”:“Starting Controller”,“controller”:“trial-controller”} {“level”:“info”,“ts”:1615690131.9252026,“logger”:“kubebuilder.controller”,“msg”:“Starting Controller”,“controller”:“suggestion-controller”} {“level”:“info”,“ts”:1615690131.9251676,“logger”:“kubebuilder.controller”,“msg”:“Starting Controller”,“controller”:“experiment-controller”} {“level”:“info”,“ts”:1615690131.9251678,“logger”:“kubebuilder.webhook”,“msg”:“installing webhook configuration in cluster”} {“level”:“info”,“ts”:1615690132.0258567,“logger”:“kubebuilder.controller”,“msg”:“Starting workers”,“controller”:“suggestion-controller”,“worker count”:1} {“level”:“info”,“ts”:1615690132.025887,“logger”:“kubebuilder.controller”,“msg”:“Starting workers”,“controller”:“trial-controller”,“worker count”:1} {“level”:“info”,“ts”:1615690132.0259328,“logger”:“kubebuilder.controller”,“msg”:“Starting workers”,“controller”:“experiment-controller”,“worker count”:1} E0314 02:48:52.027598 1 runtime.go:69] Observed a panic: “invalid memory address or nil pointer dereference” (runtime error: invalid memory address or nil pointer dereference) /go/src/github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76 /go/src/github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65 /go/src/github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 /usr/local/go/src/runtime/panic.go:969 /usr/local/go/src/runtime/panic.go:212 /usr/local/go/src/runtime/signal_unix.go:720 /go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:294 /go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:283 /go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:239 /go/src/github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215 /go/src/github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158 /go/src/github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 /go/src/github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 /go/src/github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 /usr/local/go/src/runtime/asm_amd64.s:1374 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x11cf162]

goroutine 378 [running]: github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x10c panic(0x1507140, 0x2229490) /usr/local/go/src/runtime/panic.go:969 +0x1b9 github.com/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).ReconcileTrials(0xc000403320, 0xc0003a0840, 0x2276968, 0x0, 0x0, 0xc0001be310, 0x0) /go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:294 +0x142 github.com/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).ReconcileExperiment(0xc000403320, 0xc0003a0840, 0x2276968, 0x0) /go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:283 +0x38b github.com/kubeflow/katib/pkg/controller.v1beta1/experiment.(*ReconcileExperiment).Reconcile(0xc000403320, 0xc000611f10, 0x8, 0xc000782c60, 0x2a, 0x203000, 0x203000, 0xc000062800, 0x7f173e2e2e00) /go/src/github.com/kubeflow/katib/pkg/controller.v1beta1/experiment/experiment_controller.go:239 +0x768 github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000544fa0, 0x18b6200) /go/src/github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215 +0x1de github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1() /go/src/github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158 +0x36 github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc000d1a340) /go/src/github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x5f github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000d1a340, 0x3b9aca00, 0x0, 0x100000000000001, 0xc000578a80) /go/src/github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0x105 github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc000d1a340, 0x3b9aca00, 0xc000578a80) /go/src/github.com/kubeflow/katib/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d created by github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start /go/src/github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157 +0x331 invalid memory address or nil pointer dereference

What did you expect to happen:

The experiment is successful and kubeflow is running normally.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

  • Kubeflow version (kfctl version): v1.2.0-0-gbc038f9
  • Minikube version (minikube version):
  • Kubernetes version: (use kubectl version): 1.16
  • OS (e.g. from /etc/os-release): Centos 7

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:16 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
bvbocacommented, Mar 19, 2021

@andreyvelich Thanks for your help!

0reactions
andreyvelichcommented, Mar 23, 2021

I have a question. In the early-stopping.ipynb example, the model is trained by kubernetes “JOB”, while in the official doc MXNet training, “MXJob” is recomended. What’s the difference between them?

I believe, MXJob is just one of the distributive training operators that Kubeflow provides. Since Katib supports any Kubernetes resource as a Trial template you can easily use MXJob instead of Kubernetes Job.

We don’t have an example with running MXJob, but it would be great to have a such contribution.

You can learn more about MXJob here: https://github.com/kubeflow/mxnet-operator.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Go: panic: runtime error: invalid memory address or nil pointer ...
When err is nil, resp always contains a non-nil resp.Body." ... The nil pointer dereference is in line 65 which is the defer...
Read more >
Invalid memory address or nil pointer dereference - YourBasic
Answer. The uninitialized pointer p in the main function is nil , and you can't follow the nil pointer.
Read more >
"invalid memory address or nil pointer dereference" (runtime ...
Recovered from panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference).
Read more >
Node wont start after update to v1.6.4 (runtime error
... 2020-06-30T17:50:04.982062613Z panic: runtime error: invalid memory address or nil pointer dereference [recovered], ...
Read more >
"panic: runtime error: invalid memory address or nil pointer ...
panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x8fa379] ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found