question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Kubeflow Dex Distribution] KF Pipelines 100% Unusable - MULTIPLE PEOPLE REPORTING

See original GitHub issue

What steps did you take:

KFP in KF 1.2.0 with Dex on K8s 1.18.9 does not work. I receive an error in the KF dashboard when attempting to view pipelines:

Error: failed to retrieve list of pipelines. Click Details for more information. -> An error occured, no healthy upstream

What happened:

Installed Kubeflow 1.2.0 on-prem as per installation instructions. Any attempt to see pipelines or use pipelines fails.

What did you expect to happen:

I expected to be able to use Pipelines

Environment:

Kubernetes version 1.18.9 Kubeflow version 1.2.0 Installed with Dex, configured after deploy to use LDAP.

ml-pipelines pod fails to start completely. Logs indicate

How did you deploy Kubeflow Pipelines (KFP)?

Installed Kubeflow Pipelines as part of Kubeflow installation for on-prem with dex.

KFP version: 1.0.4

KFP SDK version: I HAVEN’T GOTTEN FAR ENOUGH TO USE THIS!

Anything else you would like to add:

ml-pipeline pod refuses to run:

$ kubectl get pods -n kubeflow
NAME                                                     READY   STATUS    RESTARTS   AGE
admission-webhook-bootstrap-stateful-set-0               1/1     Running   0          4d20h
admission-webhook-deployment-5d9ccb5696-f6zs6            1/1     Running   0          4d20h
application-controller-stateful-set-0                    1/1     Running   0          4d21h
argo-ui-684bcb587f-z84nh                                 1/1     Running   0          4d16h
cache-deployer-deployment-6667847478-7h2w8               2/2     Running   2          4d21h
cache-server-bd9c859db-755zj                             2/2     Running   527        4d21h
centraldashboard-895c4c768-46xgc                         1/1     Running   0          4d21h
jupyter-web-app-deployment-6588c6f544-c5m45              1/1     Running   0          3d3h
katib-controller-75c8d47f8c-5k2tr                        1/1     Running   0          4d21h
katib-db-manager-6c88c68d79-cgxdh                        1/1     Running   0          4d16h
katib-mysql-858f68f588-zvhnj                             1/1     Running   0          4d21h
katib-ui-68f59498d4-bkscp                                1/1     Running   0          4d21h
kfserving-controller-manager-0                           2/2     Running   0          36h
kubeflow-pipelines-profile-controller-69c94df75b-xtpfj   1/1     Running   0          4d21h
metacontroller-0                                         1/1     Running   0          4d21h
metadata-db-757dc9c7b5-pt75k                             1/1     Running   0          4d21h
metadata-envoy-deployment-6ff58757f6-57pjc               1/1     Running   0          4d21h
metadata-grpc-deployment-76d69f69c8-xcmjk                1/1     Running   3          4d21h
metadata-writer-6d94ffb7df-mhnxj                         2/2     Running   1          4d21h
minio-66c9cd74c9-jrss8                                   1/1     Running   0          4d21h

ml-pipeline-54989c9946-s2f46                             1/2     Running   926        4d21h

ml-pipeline-persistenceagent-7f6bf7646-ldct6             2/2     Running   0          4d21h
ml-pipeline-scheduledworkflow-66db7bcf5d-q244j           2/2     Running   0          4d16h
ml-pipeline-ui-756b58fb-gpwn9                            2/2     Running   0          4d21h
ml-pipeline-viewer-crd-58f59f87db-dmj2l                  2/2     Running   2          4d21h
ml-pipeline-visualizationserver-6f9ff4974-k4cf9          2/2     Running   0          4d21h
mpi-operator-77bb5d8f4b-w4dhj                            1/1     Running   0          4d21h
mxnet-operator-68b688bb69-b5985                          1/1     Running   0          4d16h
mysql-7694c6b8b7-jthp6                                   2/2     Running   0          4d17h
notebook-controller-deployment-58447d4b4c-6ll57          1/1     Running   0          4d21h
profiles-deployment-78d4549cbc-z9lld                     2/2     Running   0          4d21h
pytorch-operator-b79799447-f8nnl                         1/1     Running   0          4d21h
seldon-controller-manager-5fc5dfc86c-nh2qm               1/1     Running   0          4d21h
spark-operatorsparkoperator-67c6bc65fb-8tgn5             1/1     Running   0          4d21h
tf-job-operator-5c97f4bf7-g5vtw                          1/1     Running   0          4d21h
workflow-controller-5c7cc7976d-5n6tb                     1/1     Running   0          4d16h
$ kubectl logs -n kubeflow ml-pipeline-54989c9946-s2f46 ml-pipeline-api-server 
I0301 20:22:00.153656       6 client_manager.go:134] Initializing client manager
I0301 20:22:00.153817       6 config.go:50] Config DBConfig.ExtraParams not specified, skipping
[mysql] 2021/03/01 20:22:01 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:22:02 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:22:04 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:22:07 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:22:10 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:22:13 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:22:16 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:22:23 packets.go:36: unexpected EOF
$ kubectl logs -n kubeflow mysql-7694c6b8b7-jthp6 mysql
...
MySQL init process done. Ready for start up.

2021-02-25 03:04:17 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2021-02-25 03:04:17 0 [Note] mysqld (mysqld 5.6.44) starting as process 1 ...
2021-02-25 03:04:17 1 [Note] Plugin 'FEDERATED' is disabled.
2021-02-25 03:04:17 1 [Note] InnoDB: Using atomics to ref count buffer pool pages
2021-02-25 03:04:17 1 [Note] InnoDB: The InnoDB memory heap is disabled
2021-02-25 03:04:17 1 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2021-02-25 03:04:17 1 [Note] InnoDB: Memory barrier is not used
2021-02-25 03:04:17 1 [Note] InnoDB: Compressed tables use zlib 1.2.11
2021-02-25 03:04:17 1 [Note] InnoDB: Using Linux native AIO
2021-02-25 03:04:17 1 [Note] InnoDB: Using CPU crc32 instructions
2021-02-25 03:04:17 1 [Note] InnoDB: Initializing buffer pool, size = 128.0M
2021-02-25 03:04:17 1 [Note] InnoDB: Completed initialization of buffer pool
2021-02-25 03:04:17 1 [Note] InnoDB: Highest supported file format is Barracuda.
2021-02-25 03:04:17 1 [Note] InnoDB: 128 rollback segment(s) are active.
2021-02-25 03:04:17 1 [Note] InnoDB: Waiting for purge to start
2021-02-25 03:04:17 1 [Note] InnoDB: 5.6.44 started; log sequence number 1625997
2021-02-25 03:04:17 1 [Note] Server hostname (bind-address): '*'; port: 3306
2021-02-25 03:04:17 1 [Note] IPv6 is available.
2021-02-25 03:04:17 1 [Note]   - '::' resolves to '::';
2021-02-25 03:04:17 1 [Note] Server socket created on IP: '::'.
2021-02-25 03:04:17 1 [Warning] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.
2021-02-25 03:04:17 1 [Warning] 'proxies_priv' entry '@ root@mysql-7694c6b8b7-jthp6' ignored in --skip-name-resolve mode.
2021-02-25 03:04:17 1 [Note] Event Scheduler: Loaded 0 events
2021-02-25 03:04:17 1 [Note] mysqld: ready for connections.
Version: '5.6.44'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server (GPL)

Cache Server also is unable to connect to MYSQL

$ kubectl logs -n kubeflow cache-server-bd9c859db-755zj  server 
2021/03/01 20:19:21 Initing client manager....
[mysql] 2021/03/01 20:19:22 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:19:24 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:19:25 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:19:27 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:19:30 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:19:33 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:19:39 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:19:46 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:19:55 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:20:07 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:20:26 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:21:02 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:21:40 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:22:35 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:23:58 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:25:09 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:25:50 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:25:51 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:25:52 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:25:54 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:25:56 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:25:59 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:26:02 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:26:06 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:26:15 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:26:20 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:26:34 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:27:03 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:27:45 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:28:11 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:29:39 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:30:12 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:31:32 packets.go:36: unexpected EOF
[mysql] 2021/03/01 20:32:07 packets.go:36: unexpected EOF
F0301 20:32:07.437107       1 error.go:305] invalid connection
goroutine 1 [running]:
github.com/golang/glog.stacks(0xc000786600, 0xc0004790a0, 0x3f, 0x40)
	/go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:769 +0xd4
github.com/golang/glog.(*loggingT).output(0x237c4c0, 0xc000000003, 0xc000479080, 0x20d8f16, 0x8, 0x131, 0x0)
	/go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:720 +0x329
github.com/golang/glog.(*loggingT).printf(0x237c4c0, 0x3, 0x14ca0b3, 0x2, 0xc0006c58f8, 0x1, 0x1)
	/go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:655 +0x14b
github.com/golang/glog.Fatalf(0x14ca0b3, 0x2, 0xc0006c58f8, 0x1, 0x1)
	/go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:1148 +0x67
github.com/kubeflow/pipelines/backend/src/common/util.TerminateIfError(0x1649b00, 0xc0005eca40)
	/go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:305 +0x79
main.initMysql(0x7ffefc6905bd, 0x5, 0x7ffefc6905cd, 0x5, 0x7ffefc6905dd, 0x4, 0x7ffefc6905ec, 0x7, 0x7ffefc6905fe, 0x4, ...)
	/go/src/github.com/kubeflow/pipelines/backend/src/cache/client_manager.go:157 +0x466
main.initDBClient(0x7ffefc6905bd, 0x5, 0x7ffefc6905cd, 0x5, 0x7ffefc6905dd, 0x4, 0x7ffefc6905ec, 0x7, 0x7ffefc6905fe, 0x4, ...)
	/go/src/github.com/kubeflow/pipelines/backend/src/cache/client_manager.go:71 +0x599
main.(*ClientManager).init(0xc0006c5db8, 0x7ffefc6905bd, 0x5, 0x7ffefc6905cd, 0x5, 0x7ffefc6905dd, 0x4, 0x7ffefc6905ec, 0x7, 0x7ffefc6905fe, ...)
	/go/src/github.com/kubeflow/pipelines/backend/src/cache/client_manager.go:57 +0x80
main.NewClientManager(0x7ffefc6905bd, 0x5, 0x7ffefc6905cd, 0x5, 0x7ffefc6905dd, 0x4, 0x7ffefc6905ec, 0x7, 0x7ffefc6905fe, 0x4, ...)
	/go/src/github.com/kubeflow/pipelines/backend/src/cache/client_manager.go:169 +0xab
main.main()
	/go/src/github.com/kubeflow/pipelines/backend/src/cache/main.go:71 +0x367

Attempted suggestions for repair (ALL fail - please do not suggest)

  1. ISTIO disable ISTIO_MUTUAL -> DISABLE : This allows the mysql db to be populated but the KFP UI will NOT startup.
  2. ISTIO configure STRICT vs PERMISSIVE : Pipelines and Jupyter Notebooks will not come up.

The product as advertised online does not work on a vanilla on-prem, K8s installation. It appears to work on GCP, Azure, AwS, and possibly IBM.

Provided diagnostic tools are not compatible with an on-prem installation:

$ kfp diagnose_me
Google Cloud SDK is not installed, gcloud, gsutil and kubectl are required for this app to run. Please follow instructions at https://cloud.google.com/sdk/install to install the SDK.

/kind bug

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:12 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
Shakedcommented, Jun 3, 2021

@Bobgy is there a solution for that?

1reaction
Bobgycommented, Mar 19, 2021

You are right, I was only referring to the distribution

Read more comments on GitHub >

github_iconTop Results From Across the Web

Multi-user Isolation - Kubeflow Pipelines
Kubeflow Pipelines separates resources using Kubernetes namespaces that are managed by Kubeflow's Profile resources. Other users cannot see ...
Read more >
Accelerating Machine Learning App Development ... - YouTube
... and run such multi -step pipelines, without sacrificing rapid prototyping and experimentation. Find out how running Kubeflow on Google ...
Read more >
Enabling Multi-user Machine Learning Workflows for Kubeflow ...
Enabling Multi -user Machine Learning Workflows for Kubeflow Pipelines - Yannis Zarkadas & Yuan Gong.
Read more >
Build and deploy a scalable machine learning system on ...
You can use this Kubeflow distribution to build ML systems on top of Amazon ... Secure authentication of Kubeflow users with Amazon Cognito....
Read more >
Machine Learning at scale: first impressions of Kubeflow
ML models might require data that's currently unavailable in production. An analyst may easily calculate the difference between a transaction's ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found