question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Exceptions thrown in a Flink streamlet are not logged anywhere.

See original GitHub issue

Describe the bug A clear and concise description of what the bug is. Exceptions thrown in a Flink streamlet are not logged anywhere. for instance throwing an exception in buildExecutionGraph or throwing an exception in a

readStream(in).map {
  // throw here
}

Is not showing up in any logs.

To Reproduce Modify for instance taxi-ride app to throw exceptions in the processor.

Expected behavior To see exception stacktraces in the logs, and a restart of the pod.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
Lockdaincommented, Oct 29, 2020

Hi @franciscolopezsancho

I apologize in advance in case I’ve missed some parts of your investigation.

The interesting question is to investigate how Cloudflow together with Lyft operator under the hood handles automatic job recovery in cases when the job fails due to some “recoverable” reasons. There is a great example available on YouTube - here.

1reaction
Lockdaincommented, Oct 24, 2020

Hi @franciscolopezsancho

Thanks for the great investigation.

We’ve also observed a faulty behavior in case of JM pod restarts. So are you able to walk through the following steps using your example?

  1. Run a Flink job which is totally correct and doesn’t contain any hardcoded exceptions.
  2. Make a few curl’s to the entrypoint in order to pass some records through the job.
  3. Force delete the job manager pod (a second case - try to shutdown gracefully).
  4. Wait until K8s will schedule and run a new JM pod.
  5. Look at the logs (new JM, old TMs)

In the default (not HA-mode) JM becomes a single point of failure, so the Flink operator relies on the default K8s politics considering pod re-scheduling. So it would be also great to repeat the same scenario against some Flink cluster with JM running in HA-mode.

Thanks in advance for any assistance!

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to catch the exception thrown from flink's readFile(path)?
It can work when file is valid,. but if gzip file is invalid, flink job will be killed. There is exception log: java.io.IOException:...
Read more >
lightbend/cloudflow - Gitter
Help with a little logging problem, please. When starting our project in runLocal and in OKD, it was noticed that the settings for...
Read more >
An exception is thrown out when recover job timers ... - Apache
An exception is thrown out when recover job timers from checkpoint file. Status: Assignee: Priority: Resolution:.
Read more >
Mastering - Kafka Streams and ksqlDB
log). When we talk about logs in this book, we're not referring to ... Throw an exception and stop processing (giving the developers...
Read more >
Webinar: Debugging Flink Tutorial - Seth Wiesman - YouTube
As stream processing is being adopted by an enterprise for mission-critical applications, it is important to ensure your systems are robust, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found