API for attaching user metadata to the execution plan and event
See original GitHub issueIssue Description
Feature
Allow pass custom key value pairs from spark job which is sent along with lineage data either in executionPlan
or executionEvent
. This will be powerful feature to allow users to add some metadata to the lineage. I am not sure if this feature already exists as I can see a property called extraInfo: Map[String, Any] = Map.empty
in ExecutionPlan
which looks like it is may be used for this purpose.
Background
The current immediate requirement is to have JobId and RunId passed as part of lineage data.
JobId
: Is essentially just a unique name for the notebook that runs as job. Using Azure Databricks the applicationName
and applicationId
is autogenerated. These are cluster specific and not “job” specific.
RunId
: is unique for a run of a job. If there are two write operation in the job then two executionPlan
is generated. There is no way I can see to tell whether the two executionPlan
is from same job running once (meaning there are two writes) or the job running twice (single write).
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (4 by maintainers)
New API example:
There is also
NoopUserExtraMetaDataProvider
class with allforXXXX()
methods returning emptyMap
. You can extend that class and only override methods that you need.For codeless mode the following property could be used to instantiate custom
UserExtraMetaDataProvider
:Just updating… got it working. Had to implement
StandardSplineConfigurationStack
as its not present in 0.4 and the same logic is in a private function.Also for additional flexibility, I read one property from sparkconf which is semicolon separated keys which are then read and passed with
executionPlan