Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

API for attaching user metadata to the execution plan and event

See original GitHub issue

Feature

Allow pass custom key value pairs from spark job which is sent along with lineage data either in executionPlan or executionEvent. This will be powerful feature to allow users to add some metadata to the lineage. I am not sure if this feature already exists as I can see a property called extraInfo: Map[String, Any] = Map.empty in ExecutionPlan which looks like it is may be used for this purpose.

Background

The current immediate requirement is to have JobId and RunId passed as part of lineage data.

JobId: Is essentially just a unique name for the notebook that runs as job. Using Azure Databricks the applicationName and applicationId is autogenerated. These are cluster specific and not “job” specific.

RunId: is unique for a run of a job. If there are two write operation in the job then two executionPlan is generated. There is no way I can see to tell whether the two executionPlan is from same job running once (meaning there are two writes) or the job running twice (single write).

Issue Analytics

State:
Created 4 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

wajdacommented, May 17, 2020

New API example:

spark.enableLineageTracking(new DefaultSplineConfigurer(conf) {
  override protected def userExtraMetadataProvider = new UserExtraMetaDataProvider {
    override def forExecEvent(event: ExecutionEvent, ctx: HarvestingContext): Map[String, Any] = Map("foo" -> "bar")
    override def forExecPlan(plan: ExecutionPlan, ctx: HarvestingContext): Map[String, Any] = Map("foo" -> "bar")
    override def forOperation(op: ReadOperation, ctx: HarvestingContext): Map[String, Any] = Map("foo" -> "bar")
    override def forOperation(op: WriteOperation, ctx: HarvestingContext): Map[String, Any] = Map("foo" -> "bar")
    override def forOperation(op: DataOperation, ctx: HarvestingContext): Map[String, Any] = Map("foo" -> "bar")
  }
})

There is also NoopUserExtraMetaDataProvider class with all forXXXX() methods returning empty Map. You can extend that class and only override methods that you need.

For codeless mode the following property could be used to instantiate custom UserExtraMetaDataProvider:

spline.user_extra_meta_provider.className=com.my.FooBarExtraMetaDataProvider

1reaction

ankitbkocommented, Mar 12, 2020

Just updating… got it working. Had to implement StandardSplineConfigurationStack as its not present in 0.4 and the same logic is in a private function.

Also for additional flexibility, I read one property from sparkconf which is semicolon separated keys which are then read and passed with executionPlan

Top Results From Across the Web

Understand How Metadata Works in User Profiles - Auth0

Describes Auth0 user, application, and client metadata. Learn how you can use metadata to store information that does not originate from an identity ......

Metadata API Developer Guide

The main purpose of Metadata API is to move metadata between Salesforce orgs during the development process. Use Metadata API.

Metadata – curl - Stripe API reference

Metadata is useful for storing additional, structured information on an object. For example, you could store your user's corresponding unique identifier from ...

Configure Streaming API Settings with Metadata API

Use the new EventSettings Metadata API type to configure Streaming API settings, such as enabling Streaming API and dynamic generic event channel creation....

Analytics API Reference - Keen IO

All are requiring API Keys for the project you want to use. ... You are not allowed to query events older than 3...