question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[PROPOSAL] Add support for `debug` or `verbose` mode in clients

See original GitHub issue

Problem

OpenLineage integrations emit core specific facets, but also developer specific facets that fall into the following categories:

  • debug: used for improving our integrations (ex: the spark_unknown facet is used to capture spark plan metadata for nodes not yet supported)
  • verbose: used for diagnostic information (ex: the spark.logicalPlan facet is used to analyze the logical plan of your spark job)

Developer specific facets are useful, but, as we’ve see in production, such facets can be very large, change infrequently from run-to-run, and cause degradations in performance for the system where the integrations is configured. For example, complex spark logical plans can exceed 1MB in size; the spark.logicalPlan facet is captured along with every request resulting in storage costs but also increased network overhead. In this proposal, we outline the introduction and usage of two modes: debug and verbose. We also discuss expanding the openlineage.yml to support these modes.

Add debugMode and verboseMode to openlineage.yml

OpenLineage clients are configured using openlineage.yml (see Configuration section for openlineage-java). Within the configuration file, users can define the transport used to emit OL events (ex: Httptransport). Below, we extend openlineage.yml to support the configuration of debugMode and verboseMode:

debugMode:
  enabled: <bool> # Enables debug facets to be emitted along with OL events (ex: 'spark_unknown')
  facets: [array] # Provides a way to specify one or more debug facet to captured (default: 'All')
verboseMode:
  enabled: <bool> # Enables diagnostic information to be emitted along with OL events (ex: 'spark.logicalPlan')
  facets: [array] # Provides a way to specify one or more verbose facet to captured (default: 'All')
transport:
  type: <type>
  # ... transport specific configuration

Example usage of debugMode and verboseMode in ol-spark

# Enable 'debugMode' with specific facets to capture
debugMode:
  enabled: true
  facets: ['spark_unknown'] # Only capture the 'spark_unknown' facet
# Enable 'verboseMode' with facets not defined (defaults to 'All')
verboseMode:
  enabled: true
  # When 'facets' is not provided, all facets are captured by the integration
transport:
  type: HTTP
  url: http://localhost:5000
  auth:
    type: api_key
    api_key: f38d2189-c603-4b46-bdea-e573a3b5a7d5

Then, in class InternalEventHandlerFactory we can define class OpenLineageOptions in our clients to determine which facets are enabled (reference the below only code as an example usage, my not compile 😅):

  @Override
  public Collection<CustomFacetBuilder<?, ? extends RunFacet>> createRunFacetBuilders(
      OpenLineageContext context) {
    Builder<CustomFacetBuilder<?, ? extends RunFacet>> listBuilder;
    listBuilder =
        ImmutableList.<CustomFacetBuilder<?, ? extends RunFacet>>builder()
            .addAll(
                generate(
                    eventHandlerFactories, factory -> factory.createRunFacetBuilders((context))))
            .add(OpenLineageOptions.debugMode().facets(),
                 OpenLineageOptions.verboseMode().facets(),
                 new SparkVersionFacetBuilder(context));
    if (DatabricksEnvironmentFacetBuilder.isDatabricksRuntime()) {
      listBuilder.add(new DatabricksEnvironmentFacetBuilder());
    }
    return listBuilder.build();
  }

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:2
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
collado-mikecommented, Nov 7, 2022

I think facets like the spark_unknown are fundamentally different from the spark_logicalPlan facet because of who they serve. The former is really only useful for OpenLineage developers (and those who want to fill their own gaps without working in the OpenLineage codebase). The latter is really there to serve the developers who own the pipelines. Although Marquez doesn’t do anything with that facet currently, the intention is to give people visibility into how their logical plans change from one run to the next or one job version to the next.

Thus, I can see the usefulness of wanting to turn off one set of facets without turning off the other set. But I don’t think debug and verbose communicate that. Personally, I’d vote for more explicit configuration that would allow users to turn on/off specific facets one by one. I think there is enough precedent for enabled and disabled sets that users can either explicitly allow certain facets or selectively disable undesirable ones.

1reaction
pawel-big-lebowskicommented, Nov 7, 2022

I am super happy to see such improvement proposal 🚀 Wouldn’t a single debug or verbose mode be enough to achieve the same goal? By default, they include all facets and have the same behaviour.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Enable verbose logging and configure SQL Server Profiler
Enable verbose and debug logging on the client and management point · Verbose logging can be enabled by creating the following registry value...
Read more >
Add global --debug flag · Issue #5954 · cli/cli - GitHub
A verbose mode that outputs some info about what's happening would be great! Proposed solution. Adding --verbose or similar would output some ...
Read more >
Enabling debug/verbose logging for the BES Root Server and ...
Perform the following steps to enable debug/verbose logging level on BigFix server or relay. The logging can be enabled by different means; using...
Read more >
Enable verbose domain logging and debug messages
Enable verbose domain logging and debug messages ; In the Domain Separation Center, navigate to Domain Admin. ; Click Configure Domain Center.
Read more >
How to enable debug logging for ZENworks Configuration ...
Restart the Novell ZENworks Service. Log Location. program files *\novell\zenworks\logs\colw32.log. For more verbose logging (use sparingly) ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found