[SUPPORT]Missing data problem,exigency!!!
See original GitHub issueTips before filing an issue
-
Have you gone through our FAQs?
-
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
-
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
A clear and concise description of the problem.
To Reproduce
Steps to reproduce the behavior:
- Use Flink consumption Kafka to write hoodie in real time (MOR table)
- Spark3.2.1 was used to read THE MOR table for pre-aggregation
- The written data and the actual data consumed by Flink do not match, and there is a big difference
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
-
Hudi version : 0.11.1
-
Spark version :3.2.1
-
Hive version :2.3.7
-
Hadoop version :3.0.0
-
Storage (HDFS/S3/GCS…) :hdfs
-
Running on Docker? (yes/no) :no
Additional context
Add any other context about the problem here.
Stacktrace
When I use Flink1.14.4 to consume Kakfa's write hoodie, I stop consuming after consuming, keep flink consumption data consistent with Kafka Tool data, and then check data through Spark3.2.1, I read far less data than real data, and some data also disappeared. None. At the same time, we created a new consumer program to consume data into Clickhouse in real time, and found that clickhouse had far more data than HUDi, and some data was hoodie missing in Clickhouse. There is no abnormal procedure in the whole operation process .
Issue Analytics
- State:
- Created a year ago
- Comments:14 (8 by maintainers)
Top GitHub Comments
Sorry, the data is missing due to that version. I have downgraded the version and haven’t upgraded it yet. Expect to wait until 0.12.1 before considering the upgrade
0.12.1 expects to solve the problem, feel free to re-open it if the problem still exists.