question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT]Missing data problem,exigency!!!

See original GitHub issue

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

  1. Use Flink consumption Kafka to write hoodie in real time (MOR table)
  2. Spark3.2.1 was used to read THE MOR table for pre-aggregation
  3. The written data and the actual data consumed by Flink do not match, and there is a big difference

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version : 0.11.1

  • Spark version :3.2.1

  • Hive version :2.3.7

  • Hadoop version :3.0.0

  • Storage (HDFS/S3/GCS…) :hdfs

  • Running on Docker? (yes/no) :no

Additional context

Add any other context about the problem here.

Stacktrace

When I use Flink1.14.4 to consume Kakfa's write hoodie, I stop consuming after consuming, keep flink consumption data consistent with Kafka Tool data, and then check data through Spark3.2.1, I read far less data than real data, and some data also disappeared. None. At the same time, we created a new consumer program to consume data into Clickhouse in real time, and found that clickhouse had far more data than HUDi, and some data was hoodie missing in Clickhouse. There is no abnormal procedure in the whole operation process . lQLPJxZ9o8vZmozNAvzNBaywhje58RmW_UUCzuojgEBCAA_1452_764 lQLPJxZ9o8vZmmPNAojNBbCwCNd9dHpR1NICzuojf0BjAA_1456_648

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:14 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
Aloadcommented, Aug 30, 2022

@Aload : after the patched version, do you see any data loss?

Sorry, the data is missing due to that version. I have downgraded the version and haven’t upgraded it yet. Expect to wait until 0.12.1 before considering the upgrade

0reactions
danny0405commented, Nov 8, 2022

0.12.1 expects to solve the problem, feel free to re-open it if the problem still exists.

Read more comments on GitHub >

github_iconTop Results From Across the Web

The prevention and handling of the missing data - PMC - NCBI
Missing data can reduce the statistical power of a study and can produce biased estimates, leading to invalid conclusions. This manuscript reviews the...
Read more >
Dealing with missing data: Key assumptions and methods for ...
Missing data is a problem because nearly all standard statistical methods presume complete information for all the variables included in the analysis.
Read more >
Information For Law Enforcement - Privacy and Safety Hub
Data Retention Periods. Generally, once a Snap has been opened by all recipients, the content is permanently deleted and unavailable. If a Snap...
Read more >
Top 10 Ways to Avoid the Problem of Missing Data! - Enago
Furthermore, overlooking missing information might lead to loss of information and in turn low statistical power due to increase in standard errors. Therefore, ......
Read more >
Frequently Asked Questions - travel.gov
How will you send my passport and supporting documents? After we print your passport, ... I found someone's lost passport. What should I...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found