question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] ETL failure , Caused by: java.io.FileNotFoundException: No such file or directory

See original GitHub issue

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

We are getting the following error in Production for one of the end users ETL’s

Caused by: java.io.FileNotFoundException: No such file or directory: s3a://bucket/cdcv2/data/in_ums/user_umfnd_s3/2cf933ef-fe51-4e41-8b0d-af7fa5ed2d85-0_87-19419-8663185_20211116163235.parquet
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.

We had faced the same issue earlier but we mitigated it by increasing cleaner commits to 120 in spark streaming job which is writing to this location, For reference the spark streaming job has a batch interval of 10 mins where on an avg. the batches are completing in 4 mins and compaction takes 40-50mins which is triggered after 4 commits, so roughly we have around 8hrs of commits.

User Is running the ETL on spark 2.x which is combination of Spark-SQL and Spark-core

To Reproduce

Steps to reproduce the behavior:

  1. We are consistently getting the same error even after retrying the ETL

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version : 0.8

  • Spark version : 2.x

  • Hive version : 3.x

  • Hadoop version : 2.7

  • Storage (HDFS/S3/GCS…) : S3

  • Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

The above configs are of older cluster where the ETL ran. All other ETL’s running on Spark3 and using Hive3 are running fine , as mentioned earlier where we had increased the cleaner commits, one of the ETL’s had failed on newer cluster as well but post increasing the cleaner commits configs it has not failed on new cluster.

Stacktrace

Caused by: java.io.FileNotFoundException: No such file or directory: s3a://bucket/cdcv2/data/in_ums/user_umfnd_s3/2cf933ef-fe51-4e41-8b0d-af7fa5ed2d85-0_87-19419-8663185_20211116163235.parquet
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
veenaypatilcommented, Nov 22, 2021

@xushiyan we were on 2.3.2 version on older cluster, on the new one it is 3.0.2 where it worked. I am closing this issue as the ETL is working is working after migrating to 3.x spark version

0reactions
xushiyancommented, Nov 20, 2021

@veenaypatil which spark 2.x version you used exactly? Hudi supports 2.4+

Read more comments on GitHub >

github_iconTop Results From Across the Web

I have an error "java.io.FileNotFoundException: No such file or ...
I have an error "java.io.FileNotFoundException: No such file or directory" while trying to create a dynamic frame using a notebook in AWS Glue....
Read more >
"java.io.FileNotFoundException No such file or directory" Error ...
The temporary directory exists and the applmgr user has write permissions on it. The failure is not specific to one program and therefore, ......
Read more >
java.io.FileNotFoundException: (No such file or directory)
When Staging Mari file get error: Caused by: java.io.FileNotFoundException: (No such file or directory) ... at com.itko.util.StreamHelp.
Read more >
[GitHub] [hudi] veenaypatil opened a new issue #4017
... [hudi] veenaypatil opened a new issue #4017: [SUPPORT] ETL failure , Caused by: java.io.FileNotFoundException: No such file or directory.
Read more >
How to Fix the FileNotFoundException in Java.io - Rollbar
The FileNotFoundException is a checked exception in Java that occurs when an attempt to open a file denoted by a specified pathname fails....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found