Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT]Missing data problem，exigency！！！

See original GitHub issue

Tips before filing an issue

Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

Use Flink consumption Kafka to write hoodie in real time (MOR table)
Spark3.2.1 was used to read THE MOR table for pre-aggregation
The written data and the actual data consumed by Flink do not match, and there is a big difference

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Hudi version : 0.11.1
Spark version :3.2.1
Hive version :2.3.7
Hadoop version :3.0.0
Storage (HDFS/S3/GCS…) :hdfs
Running on Docker? (yes/no) :no

Additional context

Add any other context about the problem here.

Stacktrace

When I use Flink1.14.4 to consume Kakfa's write hoodie, I stop consuming after consuming, keep flink consumption data consistent with Kafka Tool data, and then check data through Spark3.2.1, I read far less data than real data, and some data also disappeared. None. At the same time, we created a new consumer program to consume data into Clickhouse in real time, and found that clickhouse had far more data than HUDi, and some data was hoodie missing in Clickhouse. There is no abnormal procedure in the whole operation process . lQLPJxZ9o8vZmozNAvzNBaywhje58RmW_UUCzuojgEBCAA_1452_764 lQLPJxZ9o8vZmmPNAojNBbCwCNd9dHpR1NICzuojf0BjAA_1456_648

Issue Analytics

State:
Created a year ago
Comments:14 (8 by maintainers)

Top GitHub Comments

1reaction

Aloadcommented, Aug 30, 2022

@Aload : after the patched version, do you see any data loss?

Sorry, the data is missing due to that version. I have downgraded the version and haven’t upgraded it yet. Expect to wait until 0.12.1 before considering the upgrade

0reactions

danny0405commented, Nov 8, 2022

0.12.1 expects to solve the problem, feel free to re-open it if the problem still exists.

Top Results From Across the Web

The prevention and handling of the missing data - PMC - NCBI

Missing data can reduce the statistical power of a study and can produce biased estimates, leading to invalid conclusions. This manuscript reviews the...

Dealing with missing data: Key assumptions and methods for ...

Missing data is a problem because nearly all standard statistical methods presume complete information for all the variables included in the analysis.

Information For Law Enforcement - Privacy and Safety Hub

Data Retention Periods. Generally, once a Snap has been opened by all recipients, the content is permanently deleted and unavailable. If a Snap...

Top 10 Ways to Avoid the Problem of Missing Data! - Enago

Furthermore, overlooking missing information might lead to loss of information and in turn low statistical power due to increase in standard errors. Therefore, ......

Frequently Asked Questions - travel.gov

How will you send my passport and supporting documents? After we print your passport, ... I found someone's lost passport. What should I...