question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Flink : UT testPrimaryKeyFieldsAtEndOfTableSchema fails probablistcally

See original GitHub issue
Test testPrimaryKeyFieldsAtEndOfTableSchema[catalogName=testhadoop, baseNamespace=default, format=PARQUET, isStreaming=true](org.apache.iceberg.flink.TestFlinkUpsert) failed with:
java.lang.AssertionError: 
Expecting:
  [+I[3, bbb, 2022-03-01], +I[1, aaa, 2022-03-01]]
to contain exactly in any order:
  [+I[2, aaa, 2022-03-01], +I[3, bbb, 2022-03-01]]
elements not found:
  [+I[2, aaa, 2022-03-01]]
and elements not expected:
  [+I[1, aaa, 2022-03-01]]

	at org.apache.iceberg.flink.TestHelpers.assertRows(TestHelpers.java:137)
	at org.apache.iceberg.flink.TestFlinkUpsert.testPrimaryKeyFieldsAtEndOfTableSchema(TestFlinkUpsert.java:262)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.junit.runners.Suite.runChild(Suite.java:128)
	at org.junit.runners.Suite.runChild(Suite.java:27)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
	at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
	at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
	at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
	at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
	at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.java:176)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:129)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:100)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:60)
	at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56)
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:133)
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:71)
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)

Looks like upsert is not working properly and hence causing flakyness of this UT

sample github run with failure : https://github.com/apache/iceberg/runs/5846777166?check_suite_focus=true

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

3reactions
yittgcommented, Apr 7, 2022

After some investigate, i found the Values is not really one operator node with tuple list, instead it derived three parallel Calc nodes due to some cast. So the order can not be guaranteed.

3reactions
openinxcommented, Apr 7, 2022

I opened a debug to verify those files, and I can see the files are laying like the following:

➜  default find . 
.
./default
./a.txt
./db
./db/upsert_on_pk_at_schema_end
./db/upsert_on_pk_at_schema_end/data
./db/upsert_on_pk_at_schema_end/data/data=aaa
./db/upsert_on_pk_at_schema_end/data/data=aaa/.00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00002.parquet.crc
./db/upsert_on_pk_at_schema_end/data/data=aaa/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00002.parquet
./db/upsert_on_pk_at_schema_end/data/data=aaa/.00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00001.parquet.crc
./db/upsert_on_pk_at_schema_end/data/data=aaa/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00001.parquet
./db/upsert_on_pk_at_schema_end/data/data=aaa/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00005.parquet
./db/upsert_on_pk_at_schema_end/data/data=aaa/.00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00005.parquet.crc
./db/upsert_on_pk_at_schema_end/data/data=bbb
./db/upsert_on_pk_at_schema_end/data/data=bbb/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00003.parquet
./db/upsert_on_pk_at_schema_end/data/data=bbb/.00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00004.parquet.crc
./db/upsert_on_pk_at_schema_end/data/data=bbb/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00004.parquet
./db/upsert_on_pk_at_schema_end/data/data=bbb/.00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00003.parquet.crc
./db/upsert_on_pk_at_schema_end/metadata
./db/upsert_on_pk_at_schema_end/metadata/version-hint.text
./db/upsert_on_pk_at_schema_end/metadata/.version-hint.text.crc
./db/upsert_on_pk_at_schema_end/metadata/e21484e0-cea6-4001-b4af-3e34c9249a88-m0.avro
./db/upsert_on_pk_at_schema_end/metadata/snap-3867050424845709517-1-e21484e0-cea6-4001-b4af-3e34c9249a88.avro
./db/upsert_on_pk_at_schema_end/metadata/e21484e0-cea6-4001-b4af-3e34c9249a88-m1.avro
./db/upsert_on_pk_at_schema_end/metadata/.v2.metadata.json.crc
./db/upsert_on_pk_at_schema_end/metadata/v2.metadata.json
./db/upsert_on_pk_at_schema_end/metadata/.v1.metadata.json.crc
./db/upsert_on_pk_at_schema_end/metadata/.snap-3867050424845709517-1-e21484e0-cea6-4001-b4af-3e34c9249a88.avro.crc
./db/upsert_on_pk_at_schema_end/metadata/.e21484e0-cea6-4001-b4af-3e34c9249a88-m1.avro.crc
./db/upsert_on_pk_at_schema_end/metadata/.e21484e0-cea6-4001-b4af-3e34c9249a88-m0.avro.crc
./db/upsert_on_pk_at_schema_end/metadata/v1.metadata.json

That means there is only one checkpoint to commit those three records, while in the partition data=aaa, those records are:

#  The equality delete file.
➜  default parquet cat ./db/upsert_on_pk_at_schema_end/data/data=aaa/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00002.parquet 
{"data": "aaa", "dt": 19052}

# The insert data file.
➜  default parquet cat ./db/upsert_on_pk_at_schema_end/data/data=aaa/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00001.parquet 
{"id": 2, "data": "aaa", "dt": 19052}
{"id": 1, "data": "aaa", "dt": 19052}

# The positional delete file.
➜  default parquet cat ./db/upsert_on_pk_at_schema_end/data/data=aaa/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00005.parquet
{"file_path": "file:/var/folders/fg/kbb4swcd0gl_3s0wlhhk9bch0000gp/T/junit3828094481211623109/default/db/upsert_on_pk_at_schema_end/data/data=aaa/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00001.parquet", "pos": 0}

The tricky thing is: the record (2, 'aaa', '2022-03-1') was written before record(1, 'aaa', '2022-03-01'), that’ why we encountered the failure case.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Apache Flink: Stateful Computations over Data Streams
All streaming use cases. Event-driven Applications; Stream & Batch Analytics ; Guaranteed correctness. Exactly-once state consistency; Event-time processing ...
Read more >
A Guide for Unit Testing in Apache Flink
A Guide for Unit Testing in Apache Flink ... Without tests, a single change in code can result in cascades of failure in...
Read more >
Getting Help - Apache Flink
I see an exception reporting “Insufficient number of network buffers”. My job fails with various exceptions from the HDFS/Hadoop code. What can I...
Read more >
Apache Flink 1.2.1 Released
The Apache Flink community released the first bugfix version of the Apache Flink 1.2 series. This release includes many critical fixes for Flink...
Read more >
Apache Flink 1.15.3 Release Announcement
The Apache Flink Community is pleased to announce the third bug fix release of the Flink 1.15 series. This release includes 59 bug...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found