Correct OpenLineage spec
See original GitHub issueFix the references in current OpenLineage spec such that it will reflect correct references to the child Elements under each facets of OpenLineage event spec. For example, the current spec defines DataSet facets as such:
"Dataset": {
"type": "object",
"properties": {
"namespace": {
"description": "The namespace containing that dataset",
"type": "string",
"example": "my-datasource-namespace"
},
"name": {
"description": "The unique name for that dataset within that namespace",
"type": "string",
"example": "instance.schema.table"
},
"facets": {
"description": "The facets for this dataset",
"type": "object",
"additionalProperties": {
"$ref": "#/$defs/DatasetFacet"
}
}
},
"required": [
"namespace",
"name"
]
},
But, because of that, the facets
section does not contain what kind of child properties may be added to it, so the user does not have ways to determine what is legal and what is not legal.
The solution is to add a more definite child facets as such:
"Dataset": {
"type": "object",
"properties": {
"namespace": {
"description": "The namespace containing that dataset",
"type": "string",
"example": "my-datasource-namespace"
},
"name": {
"description": "The unique name for that dataset within that namespace",
"type": "string",
"example": "instance.schema.table"
},
"facets": {
"description": "The common facets for this dataset",
"type": "object",
"allOf": [
{ "$ref": "facets/ColumnLineageDatasetFacet.json" },
{ "$ref": "facets/DatasourceDatasetFacet.json" },
{ "$ref": "facets/DataQualityAssertionsDatasetFacet.json" },
{ "$ref": "facets/LifecycleStateChangeDatasetFacet.json" },
{ "$ref": "facets/OwnershipDatasetFacet.json" },
{ "$ref": "facets/SchemaDatasetFacet.json" },
{ "$ref": "facets/StorageDatasetFacet.json" },
{ "$ref": "facets/SymlinksDatasetFacet.json" },
{ "$ref": "facets/DatasetVersionDatasetFacet.json" }
],
"additionalProperties": {
"$ref": "#/$defs/DatasetFacet"
}
}
},
"required": [
"namespace",
"name"
]
},
So that the spec files contain the proper referenced elements.
Issue Analytics
- State:
- Created a year ago
- Comments:6 (1 by maintainers)
Top Results From Across the Web
OpenLineage Spec - GitHub
The OpenLineage API defines events to capture the lifecycle of a Run for a given Job. When a job is being run, we...
Read more >Getting Started - OpenLineage
An open standard with a specification for collecting lineage metadata. Focuses on job-level execution. Runs; Datasets. Event-based metadata collection.
Read more >Extending OpenLineage with Facets
The core spec focuses on high-level modeling of jobs, runs, datasets, and their relation. Each OpenLineage event refers to a run of a...
Read more >Expecting Great Quality with OpenLineage Facets
Good data is paramount to making good decisions- but how can you trust the quality of your data and its dependencies?
Read more >OpenLineage API Docs
Download OpenAPI specification:Download. License: Apache 2.0. OpenLineage is an open source lineage and metadata collection API for the data ecosystem.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Sure, that goal makes sense. Unfortunately, your spec changes aren’t going to accomplish that, per the OpenAPI documentation I linked to. We need another way of generating more comprehensive documentation from the spec.
I see, and am open to anything better. This issue is not a PR, so I’d be happy to see if we could highlight any shortcomings of the current OL spec, and make necessary changes that will have a more correct representation of the spec. If there’s anybody who wants to take a poke at this, more than happy for them to also chime in.