[CircleCI/CD] integration test in spark 3.3.1 fails due to some unknown reason
See original GitHub issueProblem
The following Failed integration test is observed when Circle-ci integration test workflow runs to test integrations for spark 3.3.1. (https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/5068/workflows/1cc23ca0-a3c6-4480-8de4-7efc7aa603f0/jobs/65033)
Upon inspecting the test failure, you will see the following message:
64% EXECUTING [10m 7s]io.openlineage.spark.agent.SparkContainerIntegrationTest [10] spark_v2_drop.py, pysparkV2DropTableStartEvent.json, pysparkV2DropTableCompleteEvent.json, true FAILED (11.8s)
The source code location for initiating this test is here: https://github.com/OpenLineage/OpenLineage/blob/a6c947854ddc5189eca8318ebec59343a5467bc9/integration/spark/app/src/test/java/io/openlineage/spark/agent/SparkContainerIntegrationTest.java#L324
The location for spark_v2_drop.py
seems to be located here:
https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/test/resources/spark_scripts/spark_v2_drop.py
and the test would expect the start event and complete event of ‘dropping’ the test table would result in:
- job
open_lineage_integration_v2_commands.drop_table
in START event - job
open_lineage_integration_v2_commands.drop_table
in COMPLETE event
And that is what the integration test is trying to do all along. However, the results contain the following events:
- job
open_lineage_integration_v2_commands.create_table
in START - job
open_lineage_integration_v2_commands.create_table
in COMPLETE - job
open_lineage_integration_v2_commands.append_data
in START - job
open_lineage_integration_v2_commands.append_data
in START - job
open_lineage_integration_v2_commands.append_data
in COMPLETE - job
open_lineage_integration_v2_commands.append_data
in COMPLETE
the file targeted is /tmp/v2_drop/db.drop_table_test
Which obviously does NOT match with what is expected, and thus results in failure.
However, as outlined in this Circle CI/CD run: https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/5069/workflows/198010e0-cd25-4ea6-a8c2-d2ff017f223f/jobs/65039
spark_v2_drop.py, pysparkV2DropTableStartEvent.json, pysparkV2DropTableCompleteEvent.json, true PASSED
The same spark_v2_drop.py
does run successfully without any problems.
So, the issue is that we do NOT exactly know what’s the difference between the succesful run vs. failed run in this integration test, but it looks like the failure is consistently happening, with some randomness (or we currently do not know if there is a pattern to this.)

However, one thing that I noticed was the drop table script (spark_v2_drop.py
) does create table, and append data. So, it is possible, that due to some(?) unknown issue, the drop table part never got through successfully (or hung ?) and thus the integration test might not have received proper drop table
events which resulted in failure.
Issue Analytics
- State:
- Created 10 months ago
- Comments:5 (2 by maintainers)
ooh man, spark community reverted the change that was fixing this: https://github.com/apache/spark/commit/8ee12bb3d4682b386af30464180a0845bfc0c24d#diff-fdd1e9e26aa1ba9d1cc923ee7c84a1935dcc285502330a471f1ade7f3ad08bf9L109
I do no longer see this issue reoccurring, so closing this.