question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error while trying to run workflow on kubernetes cluster.

See original GitHub issue

@narendra36 you should set both DEFAULT_DOCKER_REGISTRY (url, e.g. https://index.docker.io/v1/) and DEFAULT_DOCKER_SECRET (k8s secret name, in the same namespace), another way is to set the registry per function, add this method to your function object fn.build_config(image='target/image:tag', secret='my_docker')

I have these set but I am still facing this issue.

Screenshot 2021-02-05 at 10 02 31 AM

kubectl logs demo-training-pipeline-bn97r-3933567025 -n kubeflow wait

time="2021-02-05T16:43:48Z" level=info msg="Starting Workflow Executor" version=v2.7.5+ede163e.dirty
time="2021-02-05T16:43:49Z" level=info msg="Creating PNS executor (namespace: kubeflow, pod: demo-training-pipeline-bn97r-3933567025, pid: 6, hasOutputs: true)"
time="2021-02-05T16:43:49Z" level=info msg="Executor (version: v2.7.5+ede163e.dirty, build_date: 2020-04-21T01:12:08Z) initialized (pod: kubeflow/demo-training-pipeline-bn97r-3933567025) with template:\n{\"name\":\"deploy-gen-iris\",\"arguments\":{},\"inputs\":{},\"outputs\":{\"parameters\":[{\"name\":\"deploy-gen-iris-image\",\"valueFrom\":{\"path\":\"/tmp/image\"}}],\"artifacts\":[{\"name\":\"deploy-gen-iris-image\",\"path\":\"/tmp/image\"},{\"name\":\"deploy-gen-iris-state\",\"path\":\"/tmp/state\"}]},\"metadata\":{\"annotations\":{\"sidecar.istio.io/inject\":\"false\"},\"labels\":{\"pipelines.kubeflow.org/cache_enabled\":\"true\"}},\"container\":{\"name\":\"\",\"image\":\"mlrun/mlrun:0.5.5-rc3\",\"command\":[\"python\",\"-m\",\"mlrun\",\"build\",\"--kfp\",\"-r\",\"{'kind': 'job', 'metadata': {'name': 'gen-iris', 'tag': '', 'project': 'sk-project'}, 'spec': {'command': '', 'args': [], 'volumes': [{'name': 'pvc-55208e8c-6cf1-483f-a107-bea804c96384', 'persistentVolumeClaim': {'claimName': 'mlrun-kit-jupyter-pvc'}}], 'volume_mounts': [{'mountPath': '/home/jovyan/data', 'name': 'pvc-55208e8c-6cf1-483f-a107-bea804c96384'}], 'env': [], 'default_handler': '', 'entry_points': {'iris_generator': {'name': 'iris_generator', 'doc': '', 'parameters': [{'name': 'context', 'default': ''}, {'name': 'format', 'default': 'csv'}], 'outputs': [{'default': ''}], 'lineno': 11}}, 'description': '', 'build': {'functionSourceCode': 'IyBHZW5lcmF0ZWQgYnkgbnVjbGlvLmV4cG9ydC5OdWNsaW9FeHBvcnRlcgoKaW1wb3J0IG9zCmZyb20gc2tsZWFybi5kYXRhc2V0cyBpbXBvcnQgbG9hZF9pcmlzCmZyb20gc2tsZWFybi5tb2RlbF9zZWxlY3Rpb24gaW1wb3J0IHRyYWluX3Rlc3Rfc3BsaXQKaW1wb3J0IG51bXB5IGFzIG5wCmZyb20gc2tsZWFybi5tZXRyaWNzIGltcG9ydCBhY2N1cmFjeV9zY29yZQpmcm9tIG1scnVuLmFydGlmYWN0cyBpbXBvcnQgVGFibGVBcnRpZmFjdCwgUGxvdEFydGlmYWN0CmltcG9ydCBwYW5kYXMgYXMgcGQKCmRlZiBpcmlzX2dlbmVyYXRvcihjb250ZXh0LCBmb3JtYXQ9J2NzdicpOgogICAgaXJpcyA9IGxvYWRfaXJpcygpCiAgICBpcmlzX2RhdGFzZXQgPSBwZC5EYXRhRnJhbWUoZGF0YT1pcmlzLmRhdGEsIGNvbHVtbnM9aXJpcy5mZWF0dXJlX25hbWVzKQogICAgaXJpc19sYWJlbHMgPSBwZC5EYXRhRnJhbWUoZGF0YT1pcmlzLnRhcmdldCwgY29sdW1ucz1bJ2xhYmVsJ10pCiAgICBpcmlzX2RhdGFzZXQgPSBwZC5jb25jYXQoW2lyaXNfZGF0YXNldCwgaXJpc19sYWJlbHNdLCBheGlzPTEpCiAgICAKICAgIGNvbnRleHQubG9nZ2VyLmluZm8oJ3NhdmluZyBpcmlzIGRhdGFmcmFtZSB0byB7fScuZm9ybWF0KGNvbnRleHQuYXJ0aWZhY3RfcGF0aCkpCiAgICBjb250ZXh0LmxvZ19kYXRhc2V0KCdpcmlzX2RhdGFzZXQnLCBkZj1pcmlzX2RhdGFzZXQsIGZvcm1hdD1mb3JtYXQsIGluZGV4PUZhbHNlKQoK', 'base_image': 'mlrun/mlrun', 'commands': ['pip install sklearn', 'pip install pyarrow']}}}\",\"--with_mlrun\",\"--skip\"],\"env\":[{\"name\":\"DEFAULT_DOCKER_REGISTRY\",\"value\":\"index.docker.io/falkonryml\"},{\"name\":\"MLRUN_NAMESPACE\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"metadata.namespace\"}}},{\"name\":\"MLRUN_ARTIFACT_PATH\",\"value\":\"/home/jovyan/demos/scikit-learn-pipeline/pipe/2536a5ff-5740-44d8-96f8-26316be0611c\"}],\"resources\":{}},\"archiveLocation\":{\"archiveLogs\":true,\"s3\":{\"endpoint\":\"minio-service.kubeflow:9000\",\"bucket\":\"mlpipeline\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"secretkey\"},\"key\":\"artifacts/demo-training-pipeline-bn97r/demo-training-pipeline-bn97r-3933567025\"}}}"
time="2021-02-05T16:43:49Z" level=info msg="Waiting on main container"
time="2021-02-05T16:43:49Z" level=warning msg="Polling root processes (1m0s)"
time="2021-02-05T16:43:49Z" level=info msg="pid 23: &{root 4096 2147484013 {990000000 63747380757 0x2d10800} {2067 96 19 16749 0 0 0 0 4096 4096 8 {1611783965 816134965} {1611783957 990000000} {1611783957 990000000} [0 0 0]}}"
time="2021-02-05T16:43:49Z" level=info msg="Secured filehandle on /proc/23/root"
time="2021-02-05T16:43:49Z" level=info msg="containerID c1afed5dcc0e0e4224c5587dc5698e96c270a652199b4fadbdf788455e0d14e4 mapped to pid 23"
time="2021-02-05T16:43:49Z" level=info msg="pid 23: &{root 4096 2147484141 {136093600 63748140229 0x2d10800} {2097271 11666555 1 16877 0 0 0 0 4096 4096 8 {1612543429 136093600} {1612543429 136093600} {1612543429 467100230} [0 0 0]}}"
time="2021-02-05T16:43:49Z" level=info msg="Secured filehandle on /proc/23/root"
time="2021-02-05T16:43:49Z" level=info msg="pid 23: &{root 4096 2147484141 {136093600 63748140229 0x2d10800} {2097271 11666555 1 16877 0 0 0 0 4096 4096 8 {1612543429 136093600} {1612543429 136093600} {1612543429 467100230} [0 0 0]}}"
time="2021-02-05T16:43:49Z" level=info msg="pid 23: &{root 4096 2147484141 {136093600 63748140229 0x2d10800} {2097271 11666555 1 16877 0 0 0 0 4096 4096 8 {1612543429 593102754} {1612543429 136093600} {1612543429 592102734} [0 0 0]}}"
time="2021-02-05T16:43:49Z" level=info msg="pid 23: &{root 4096 2147484141 {136093600 63748140229 0x2d10800} {2097271 11666555 1 16877 0 0 0 0 4096 4096 8 {1612543429 593102754} {1612543429 136093600} {1612543429 592102734} [0 0 0]}}"
time="2021-02-05T16:43:49Z" level=info msg="pid 23: &{root 4096 2147484141 {136093600 63748140229 0x2d10800} {2097271 11666555 1 16877 0 0 0 0 4096 4096 8 {1612543429 593102754} {1612543429 136093600} {1612543429 592102734} [0 0 0]}}"
time="2021-02-05T16:43:49Z" level=info msg="pid 23: &{root 4096 2147484141 {136093600 63748140229 0x2d10800} {2097271 11666555 1 16877 0 0 0 0 4096 4096 8 {1612543429 593102754} {1612543429 136093600} {1612543429 592102734} [0 0 0]}}"
time="2021-02-05T16:43:49Z" level=info msg="pid 23: &{root 4096 2147484141 {136093600 63748140229 0x2d10800} {2097271 11666555 1 16877 0 0 0 0 4096 4096 8 {1612543429 593102754} {1612543429 136093600} {1612543429 805107001} [0 0 0]}}"
time="2021-02-05T16:43:49Z" level=info msg="pid 23: &{root 4096 2147484141 {136093600 63748140229 0x2d10800} {2097271 11666555 1 16877 0 0 0 0 4096 4096 8 {1612543429 593102754} {1612543429 136093600} {1612543429 805107001} [0 0 0]}}"
time="2021-02-05T16:43:49Z" level=info msg="pid 23: &{root 4096 2147484141 {136093600 63748140229 0x2d10800} {2097271 11666555 1 16877 0 0 0 0 4096 4096 8 {1612543429 593102754} {1612543429 136093600} {1612543429 805107001} [0 0 0]}}"
time="2021-02-05T16:43:49Z" level=info msg="pid 23: &{root 4096 2147484141 {136093600 63748140229 0x2d10800} {2097271 11666555 1 16877 0 0 0 0 4096 4096 8 {1612543429 593102754} {1612543429 136093600} {1612543429 805107001} [0 0 0]}}"
time="2021-02-05T16:43:50Z" level=info msg="pid 23: &{root 4096 2147484141 {136093600 63748140229 0x2d10800} {2097271 11666555 1 16877 0 0 0 0 4096 4096 8 {1612543429 593102754} {1612543429 136093600} {1612543429 805107001} [0 0 0]}}"
time="2021-02-05T16:43:50Z" level=info msg="pid 23: &{root 4096 2147484141 {136093600 63748140229 0x2d10800} {2097271 11666555 1 16877 0 0 0 0 4096 4096 8 {1612543429 593102754} {1612543429 136093600} {1612543429 805107001} [0 0 0]}}"
time="2021-02-05T16:43:50Z" level=info msg="pid 23: &{root 4096 2147484141 {136093600 63748140229 0x2d10800} {2097271 11666555 1 16877 0 0 0 0 4096 4096 8 {1612543429 593102754} {1612543429 136093600} {1612543429 805107001} [0 0 0]}}"
time="2021-02-05T16:43:50Z" level=info msg="pid 23: &{root 4096 2147484141 {136093600 63748140229 0x2d10800} {2097271 11666555 1 16877 0 0 0 0 4096 4096 8 {1612543429 593102754} {1612543429 136093600} {1612543429 805107001} [0 0 0]}}"
time="2021-02-05T16:43:50Z" level=info msg="pid 23: &{root 4096 2147484141 {136093600 63748140229 0x2d10800} {2097271 11666555 1 16877 0 0 0 0 4096 4096 8 {1612543429 593102754} {1612543429 136093600} {1612543429 805107001} [0 0 0]}}"
time="2021-02-05T16:43:50Z" level=info msg="main container started with container ID: c1afed5dcc0e0e4224c5587dc5698e96c270a652199b4fadbdf788455e0d14e4"
time="2021-02-05T16:43:50Z" level=info msg="Starting annotations monitor"
time="2021-02-05T16:43:50Z" level=info msg="pid 23: &{root 4096 2147484141 {136093600 63748140229 0x2d10800} {2097271 11666555 1 16877 0 0 0 0 4096 4096 8 {1612543429 593102754} {1612543429 136093600} {1612543429 805107001} [0 0 0]}}"
time="2021-02-05T16:43:50Z" level=info msg="Main pid identified as 23"
time="2021-02-05T16:43:50Z" level=info msg="Successfully secured file handle on main container root filesystem"
time="2021-02-05T16:43:50Z" level=info msg="Waiting for main pid 23 to complete"
time="2021-02-05T16:43:50Z" level=info msg="Starting deadline monitor"
time="2021-02-05T16:43:50Z" level=info msg="Stopped root processes polling due to successful securing of main root fs"
time="2021-02-05T16:44:00Z" level=info msg="/argo/podmetadata/annotations updated"
time="2021-02-05T16:44:01Z" level=info msg="Main pid 23 completed"
time="2021-02-05T16:44:01Z" level=info msg="Main container completed"
time="2021-02-05T16:44:01Z" level=info msg="Saving logs"
time="2021-02-05T16:44:01Z" level=info msg="Annotations monitor stopped"
time="2021-02-05T16:44:01Z" level=info msg="Deadline monitor stopped"
time="2021-02-05T16:44:01Z" level=info msg="S3 Save path: /tmp/argo/outputs/logs/main.log, key: artifacts/demo-training-pipeline-bn97r/demo-training-pipeline-bn97r-3933567025/main.log"
time="2021-02-05T16:44:01Z" level=info msg="Creating minio client minio-service.kubeflow:9000 using static credentials"
time="2021-02-05T16:44:01Z" level=info msg="Saving from /tmp/argo/outputs/logs/main.log to s3 (endpoint: minio-service.kubeflow:9000, bucket: mlpipeline, key: artifacts/demo-training-pipeline-bn97r/demo-training-pipeline-bn97r-3933567025/main.log)"
time="2021-02-05T16:44:01Z" level=info msg="Saving output parameters"
time="2021-02-05T16:44:01Z" level=info msg="Saving path output parameter: deploy-gen-iris-image"
time="2021-02-05T16:44:01Z" level=info msg="Copying /tmp/image from base image layer"
time="2021-02-05T16:44:01Z" level=error msg="executor error: open /tmp/image: no such file or directory"
time="2021-02-05T16:44:01Z" level=info msg="Killing sidecars"
time="2021-02-05T16:44:01Z" level=info msg="Alloc=5083 TotalAlloc=13039 Sys=71104 NumGC=4 Goroutines=14"
time="2021-02-05T16:44:01Z" level=fatal msg="open /tmp/image: no such file or directory"

kubectl logs demo-training-pipeline-bn97r-3933567025 -n kubeflow main

Runtime:
{'kind': 'job',
 'metadata': {'name': 'gen-iris', 'project': 'sk-project', 'tag': ''},
 'spec': {'args': [],
          'build': {'base_image': 'mlrun/mlrun',
                    'commands': ['pip install sklearn', 'pip install pyarrow'],
                    'functionSourceCode': 'IyBHZW5lcmF0ZWQgYnkgbnVjbGlvLmV4cG9ydC5OdWNsaW9FeHBvcnRlcgoKaW1wb3J0IG9zCmZyb20gc2tsZWFybi5kYXRhc2V0cyBpbXBvcnQgbG9hZF9pcmlzCmZyb20gc2tsZWFybi5tb2RlbF9zZWxlY3Rpb24gaW1wb3J0IHRyYWluX3Rlc3Rfc3BsaXQKaW1wb3J0IG51bXB5IGFzIG5wCmZyb20gc2tsZWFybi5tZXRyaWNzIGltcG9ydCBhY2N1cmFjeV9zY29yZQpmcm9tIG1scnVuLmFydGlmYWN0cyBpbXBvcnQgVGFibGVBcnRpZmFjdCwgUGxvdEFydGlmYWN0CmltcG9ydCBwYW5kYXMgYXMgcGQKCmRlZiBpcmlzX2dlbmVyYXRvcihjb250ZXh0LCBmb3JtYXQ9J2NzdicpOgogICAgaXJpcyA9IGxvYWRfaXJpcygpCiAgICBpcmlzX2RhdGFzZXQgPSBwZC5EYXRhRnJhbWUoZGF0YT1pcmlzLmRhdGEsIGNvbHVtbnM9aXJpcy5mZWF0dXJlX25hbWVzKQogICAgaXJpc19sYWJlbHMgPSBwZC5EYXRhRnJhbWUoZGF0YT1pcmlzLnRhcmdldCwgY29sdW1ucz1bJ2xhYmVsJ10pCiAgICBpcmlzX2RhdGFzZXQgPSBwZC5jb25jYXQoW2lyaXNfZGF0YXNldCwgaXJpc19sYWJlbHNdLCBheGlzPTEpCiAgICAKICAgIGNvbnRleHQubG9nZ2VyLmluZm8oJ3NhdmluZyBpcmlzIGRhdGFmcmFtZSB0byB7fScuZm9ybWF0KGNvbnRleHQuYXJ0aWZhY3RfcGF0aCkpCiAgICBjb250ZXh0LmxvZ19kYXRhc2V0KCdpcmlzX2RhdGFzZXQnLCBkZj1pcmlzX2RhdGFzZXQsIGZvcm1hdD1mb3JtYXQsIGluZGV4PUZhbHNlKQoK'},
          'command': '',
          'default_handler': '',
          'description': '',
          'entry_points': {'iris_generator': {'doc': '',
                                              'lineno': 11,
                                              'name': 'iris_generator',
                                              'outputs': [{'default': ''}],
                                              'parameters': [{'default': '',
                                                              'name': 'context'},
                                                             {'default': 'csv',
                                                              'name': 'format'}]}},
          'env': [],
          'volume_mounts': [{'mountPath': '/home/jovyan/data',
                             'name': 'pvc-55208e8c-6cf1-483f-a107-bea804c96384'}],
          'volumes': [{'name': 'pvc-55208e8c-6cf1-483f-a107-bea804c96384',
                       'persistentVolumeClaim': {'claimName': 'mlrun-kit-jupyter-pvc'}}]}}
> 2021-02-05 16:43:51,999 [info] remote deployment started
> 2021-02-05 16:43:51,999 [error] database connection is not configured
> 2021-02-05 16:43:51,999 [info] building image (.falkonryml/func-sk-project-gen-iris-latest)
FROM mlrun/mlrun:0.5.5-rc3
RUN pip install sklearn
RUN pip install pyarrow

> 2021-02-05 16:43:52,000 [info] using in-cluster config.
> 2021-02-05 16:43:52,019 [info] Pod mlrun-build-gen-iris-t2zh7 created
...
E0205 16:43:57.851534       1 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
	For verbose messaging see aws.Config.CredentialsChainVerboseErrors
error checking push permissions -- make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for "index.docker.io/falkonryml/func-sk-project-gen-iris-latest": POST https://index.docker.io/v2/falkonryml/func-sk-project-gen-iris-latest/blobs/uploads/: UNAUTHORIZED: authentication required; [map[Action:pull Class: Name:falkonryml/func-sk-project-gen-iris-latest Type:repository] map[Action:push Class: Name:falkonryml/func-sk-project-gen-iris-latest Type:repository]]
> 2021-02-05 16:43:59,926 [error] pod exited with error
> 2021-02-05 16:43:59,927 [info] build completed with failed
deploy error,  build failed!

_Originally posted by @Vin-itall in https://github.com/mlrun/mlrun/issues/244#issuecomment-774160009_

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:33

github_iconTop GitHub Comments

2reactions
Hedingbercommented, Feb 14, 2021

@Vin-itall Okay I know what it is. Generally speaking, the whole pipelines area had several changes recently and while it does work smoothly in the enterprise version of Iguazio, looks like we have some breakages happening in the open source. I will push to have this one and the previous one fixed soon, and will update you when it’s done. Thanks again for the patient and corporation!

1reaction
Hedingbercommented, Feb 10, 2021

@Vin-itall Thanks a lot for the patience and all the details, I think I found the problem and it is indeed a bug in the code. When you’re running a build from a pipeline this line makes it run directly from the pipeline pod, instead of going through the API. Unlike the api pod, the pipeline pod doesn’t (and shouldn’t) have all the env vars set, for your case, the DEFAULT_DOCKER_SECRET is not set, therefore you don’t see any secret mounted on the kaniko pod, and therefore authorization fails. The fix should be fairly simply, just make the build to happen through the API - I will do it tomorrow. You can see that if you’ll run this example notebook (you already have it in the helm chart’s Jupyter under the examples directory) the build will succeed (it’s building not as part of a pipeline).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kubernetes CrashLoopBackOff Error: What It Is and How to Fix It
CrashLoopBackOff is a common Kubernetes error, which indicates that a pod failed to start, Kubernetes tried to restart it, and it continued to...
Read more >
FAQ - Argo Workflows - The workflow engine for Kubernetes
If you want SSO, try running with --auth-mode=sso . ... You're probably getting a permission denied error because your RBAC is not configured....
Read more >
Amazon EKS troubleshooting - AWS Documentation
If you receive one of the following errors while running kubectl commands, then your kubectl is not configured properly for Amazon EKS or...
Read more >
Troubleshooting | Google Kubernetes Engine (GKE)
Authentication and authorization errors when connecting to GKE clusters. This issue might occur when you try to run a kubectl command in your...
Read more >
Working with Kubernetes in VS Code
You can create a local Kubernetes cluster with minikube or an Azure ... In addition, if you want to iteratively run and debug...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found