question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[CircleCI/CD] integration test in spark 3.3.1 fails due to some unknown reason

See original GitHub issue

Problem

image The following Failed integration test is observed when Circle-ci integration test workflow runs to test integrations for spark 3.3.1. (https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/5068/workflows/1cc23ca0-a3c6-4480-8de4-7efc7aa603f0/jobs/65033)

Upon inspecting the test failure, you will see the following message:

64% EXECUTING [10m 7s]io.openlineage.spark.agent.SparkContainerIntegrationTest [10] spark_v2_drop.py, pysparkV2DropTableStartEvent.json, pysparkV2DropTableCompleteEvent.json, true FAILED (11.8s)

The source code location for initiating this test is here: https://github.com/OpenLineage/OpenLineage/blob/a6c947854ddc5189eca8318ebec59343a5467bc9/integration/spark/app/src/test/java/io/openlineage/spark/agent/SparkContainerIntegrationTest.java#L324 The location for spark_v2_drop.py seems to be located here: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/test/resources/spark_scripts/spark_v2_drop.py

and the test would expect the start event and complete event of ‘dropping’ the test table would result in:

  1. job open_lineage_integration_v2_commands.drop_table in START event
  2. job open_lineage_integration_v2_commands.drop_table in COMPLETE event

And that is what the integration test is trying to do all along. However, the results contain the following events:

  1. job open_lineage_integration_v2_commands.create_table in START
  2. job open_lineage_integration_v2_commands.create_table in COMPLETE
  3. job open_lineage_integration_v2_commands.append_data in START
  4. job open_lineage_integration_v2_commands.append_data in START
  5. job open_lineage_integration_v2_commands.append_data in COMPLETE
  6. job open_lineage_integration_v2_commands.append_data in COMPLETE

the file targeted is /tmp/v2_drop/db.drop_table_test Which obviously does NOT match with what is expected, and thus results in failure.

However, as outlined in this Circle CI/CD run: https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/5069/workflows/198010e0-cd25-4ea6-a8c2-d2ff017f223f/jobs/65039

spark_v2_drop.py, pysparkV2DropTableStartEvent.json, pysparkV2DropTableCompleteEvent.json, true PASSED

The same spark_v2_drop.py does run successfully without any problems.

So, the issue is that we do NOT exactly know what’s the difference between the succesful run vs. failed run in this integration test, but it looks like the failure is consistently happening, with some randomness (or we currently do not know if there is a pattern to this.)

image

However, one thing that I noticed was the drop table script (spark_v2_drop.py) does create table, and append data. So, it is possible, that due to some(?) unknown issue, the drop table part never got through successfully (or hung ?) and thus the integration test might not have received proper drop table events which resulted in failure.

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

0reactions
howardyoocommented, Dec 12, 2022

I do no longer see this issue reoccurring, so closing this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why Do My Tests Pass Locally but Fail on CircleCI?
Some testing frameworks may not have timezone-aware modules. If the machine does not have a set timezone, some tests may fail.
Read more >
Test Summary Troubleshooting - CircleCI Support
The test summary in the UI is a place where we display some basic information about tests such as which tests failed, or...
Read more >
Troubleshooting Unexplained Build Failures - CircleCI
Just remember, the next time that your build spontaneously fails, it is because something has changed. Fortunately, CircleCI provides solutions ...
Read more >
Testing works fine when run locally but fails on Circleci
It sounds more like a code error in my PHP project, however, I do not have any of these whenever I run it...
Read more >
Automatically identify which code changes caused errors
We will execute our build on CircleCI, run the application, then generate a test error. This will send a test error to Rollbar...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found