Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Exceptions thrown in a Flink streamlet are not logged anywhere.

See original GitHub issue

Describe the bug A clear and concise description of what the bug is. Exceptions thrown in a Flink streamlet are not logged anywhere. for instance throwing an exception in buildExecutionGraph or throwing an exception in a

readStream(in).map {
  // throw here
}

Is not showing up in any logs.

To Reproduce Modify for instance taxi-ride app to throw exceptions in the processor.

Expected behavior To see exception stacktraces in the logs, and a restart of the pod.

Issue Analytics

State:
Created 3 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

Lockdaincommented, Oct 29, 2020

Hi @franciscolopezsancho

I apologize in advance in case I’ve missed some parts of your investigation.

The interesting question is to investigate how Cloudflow together with Lyft operator under the hood handles automatic job recovery in cases when the job fails due to some “recoverable” reasons. There is a great example available on YouTube - here.

1reaction

Lockdaincommented, Oct 24, 2020

Hi @franciscolopezsancho

Thanks for the great investigation.

We’ve also observed a faulty behavior in case of JM pod restarts. So are you able to walk through the following steps using your example?

Run a Flink job which is totally correct and doesn’t contain any hardcoded exceptions.
Make a few curl’s to the entrypoint in order to pass some records through the job.
Force delete the job manager pod (a second case - try to shutdown gracefully).
Wait until K8s will schedule and run a new JM pod.
Look at the logs (new JM, old TMs)

In the default (not HA-mode) JM becomes a single point of failure, so the Flink operator relies on the default K8s politics considering pod re-scheduling. So it would be also great to repeat the same scenario against some Flink cluster with JM running in HA-mode.

Thanks in advance for any assistance!