question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Correct OpenLineage spec

See original GitHub issue

Fix the references in current OpenLineage spec such that it will reflect correct references to the child Elements under each facets of OpenLineage event spec. For example, the current spec defines DataSet facets as such:

    "Dataset": {
      "type": "object",
      "properties": {
        "namespace": {
          "description": "The namespace containing that dataset",
          "type": "string",
          "example": "my-datasource-namespace"
        },
        "name": {
          "description": "The unique name for that dataset within that namespace",
          "type": "string",
          "example": "instance.schema.table"
        },
        "facets": {
          "description": "The facets for this dataset",
          "type": "object",
          "additionalProperties": {
            "$ref": "#/$defs/DatasetFacet"
          }
        }
      },
      "required": [
        "namespace",
        "name"
      ]
    },

But, because of that, the facets section does not contain what kind of child properties may be added to it, so the user does not have ways to determine what is legal and what is not legal.

The solution is to add a more definite child facets as such:

    "Dataset": {
      "type": "object",
      "properties": {
        "namespace": {
          "description": "The namespace containing that dataset",
          "type": "string",
          "example": "my-datasource-namespace"
        },
        "name": {
          "description": "The unique name for that dataset within that namespace",
          "type": "string",
          "example": "instance.schema.table"
        },
        "facets": {
          "description": "The common facets for this dataset",
          "type": "object",
          "allOf": [
            { "$ref": "facets/ColumnLineageDatasetFacet.json" },
            { "$ref": "facets/DatasourceDatasetFacet.json" },
            { "$ref": "facets/DataQualityAssertionsDatasetFacet.json" },
            { "$ref": "facets/LifecycleStateChangeDatasetFacet.json" },
            { "$ref": "facets/OwnershipDatasetFacet.json" },
            { "$ref": "facets/SchemaDatasetFacet.json" },
            { "$ref": "facets/StorageDatasetFacet.json" },
            { "$ref": "facets/SymlinksDatasetFacet.json" },
            { "$ref": "facets/DatasetVersionDatasetFacet.json" }
          ],
          "additionalProperties": {
            "$ref": "#/$defs/DatasetFacet"
          }
        }
      },
      "required": [
        "namespace",
        "name"
      ]
    },

So that the spec files contain the proper referenced elements.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
collado-mikecommented, Nov 3, 2022

Sure, that goal makes sense. Unfortunately, your spec changes aren’t going to accomplish that, per the OpenAPI documentation I linked to. We need another way of generating more comprehensive documentation from the spec.

0reactions
howardyoocommented, Nov 3, 2022

Sure, that goal makes sense. Unfortunately, your spec changes aren’t going to accomplish that, per the OpenAPI documentation I linked to. We need another way of generating more comprehensive documentation from the spec.

I see, and am open to anything better. This issue is not a PR, so I’d be happy to see if we could highlight any shortcomings of the current OL spec, and make necessary changes that will have a more correct representation of the spec. If there’s anybody who wants to take a poke at this, more than happy for them to also chime in.

Read more comments on GitHub >

github_iconTop Results From Across the Web

OpenLineage Spec - GitHub
The OpenLineage API defines events to capture the lifecycle of a Run for a given Job. When a job is being run, we...
Read more >
Getting Started - OpenLineage
An open standard with a specification for collecting lineage metadata. Focuses on job-level execution. Runs; Datasets. Event-based metadata collection.
Read more >
Extending OpenLineage with Facets
The core spec focuses on high-level modeling of jobs, runs, datasets, and their relation. Each OpenLineage event refers to a run of a...
Read more >
Expecting Great Quality with OpenLineage Facets
Good data is paramount to making good decisions- but how can you trust the quality of your data and its dependencies?
Read more >
OpenLineage API Docs
Download OpenAPI specification:Download. License: Apache 2.0. OpenLineage is an open source lineage and metadata collection API for the data ecosystem.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found