question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[backend] Metadata/Executions not written in 1.7.0-rc1 (New visualisations not working as a result)

See original GitHub issue

/kind bug

I upgraded Kubeflow from 1.4.0 to 1.7.0-rc1 with the platnform-agnostic manifests.

While I now see correct visualizations of statistics from runs that happened before upgrading to 1.7.0-rc1, new runs only display the markdown details.

The TFX pipelines I submit are exactly the same. On the new runs the ML Metadata tab of the components prints:

“Corresponding ML Metadata not found.”

Furthermore I don’t see any new executions on the executions page despite running many pipelines since upgrading.

I don’t see anything special in the logs of the TFX pods except:

WARNING:absl:metadata_connection_config is not provided by IR.

But that was present before upgrading to 1.7.0-rc1.

The only errors I see in the metadata-grpc-deployment pod is:

name: "sp-lstm-rh6xt"
Internal: mysql_query failed: errno: 1062, error: Duplicate entry '48-sp-lstm-rh6xt' for key 'type_id'
	Cannot create node for type_id: 48 name: "sp-lstm-rh6xt"

Which I also think is normal?

Basically I don’t think executions and artifacts are getting written to the DB for some reason in 1.7.0-rc1. Not sure how to debug this. This causes the visualizations to not show up as far as I can see.

Metadata in the TFX pipelines is configured via the get_default_kubeflow_metadata_config tfx.orchestration.kubeflow function.

Environment:

Kubeflow version: 1.4.0 -> 1.7.0-rc1 kfctl version: Not used. Using tfx.orchestration.kubeflow to submit pipelines. Kubernetes platform: Upstream kubeadm: k8s v1.20.5 Kubernetes version: (use kubectl version): OS (e.g. from /etc/os-release): Centos 8

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:28 (17 by maintainers)

github_iconTop GitHub Comments

3reactions
jiyongjung0commented, Aug 13, 2021

I found some possible cause. It is related to the changes in the way TFX stores their contexts since 1.0.0 (which is related to the changes in the execution stack using TFX IR).

In TFX 0.X, the context were

  • type:pipeline, value: “my-pipeline”
  • type:run, value: “my-pipeline.my-pipeline-xaew1” (some hash is appended in the second part.)
  • type:component_run, value: “my-pipeline.my-pipeline-xaew1.CsvExampleGen” Related code

However in TFX 1.0, the context became

  • type:pipeline, value: “my-pipeline”
  • type:pipeline_run, value: “my-pipeline-xaew1”
  • type:node, value: “my-pipeline.CsvExampleGen”

Related code

So it seems like Kubeflow Pipelines cannot find context (and artifacts) properly. I think that we should change mlmd access code like here.

CC. @zhitaoli , @1025KB , @Bobgy

2reactions
jiyongjung0commented, Aug 13, 2021

Unfortunately, it seems that there is no direct clue when finding executions. (Artifacts has tfx_version property, but there is no such information in Context / Execution.)

I think that we can try to find 1.0 context first, and fallback to 0.X context if not found.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Bug listing with status RESOLVED with resolution OBSOLETE ...
Bug:1523 - "[IDEA] Offload work by distributing trivial ebuild maintenance ... script for is not working" status:RESOLVED resolution:OBSOLETE severity:major ...
Read more >
Changelog — Elyra 3.14.0.dev0 documentation
Changelog¶. A summary of new feature highlights is located on the GitHub release page. ... Fix metadata tag creation and updates not persisting...
Read more >
Spring Cloud Data Flow Reference Guide
If you are getting started with Spring Cloud Data Flow, this section is for you. In this section, we answer the basic “what?”,...
Read more >
Introduction to metadata, metrics, and visualizations
Anything that you print as JSON from your code gets collected as potential Valohai metadata. Visualize and compare execution metrics as a time ......
Read more >
Search Results - CVE
1 is vulnerable to reflective cross-site scripting (XSS). The web application does not adequately sanitize request strings of malicious JavaScript. An attacker ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found