Abolish distinction between cancelled and failed Job/Deferred
See original GitHub issueDistinction between cancelled and failed states for Job
and Deferred
shall be abolished. The rationale follows.
-
Public documentation for
Job
does not mention the failed state at all, however usinggetCancellationException
one can still indirectly figure out if the coroutine that was started withlaunch
had completed normally or with exception, so hiding that from theJob
state machine does not really help. -
When
Job
uses parallel decomposition of its work by launching child coroutines it accidentally leaks its internal implementation details. When a child coroutine crashes with exception it cancels its parent, thereby obviously revealing this failure with cancelled state of the parent job. However, when the parent itself crashes, it still technically in completed state (not cancelled), but see above. -
When the parent coroutine completes exceptionally it waits for its children before becoming failed. This is not an ideal behavior for a parallel decomposition of work. When the parent coroutine crashes, then we really want to cancel all the children asap, since the “work as a whole” was obviously not successful.
All in all, this makes distinction between cancelled and failed Job
or Deferred
really irrelevant.
The plan is as follows:
- Deprecate
Deferred.isCompletedExceptionally
in favor ofisCancelled
. - Deprecate
CompletableDeferred.completeExceptionally
in favor ofcancel
.
Change the behavior for crashed coroutines. When coroutine crashes with exception it shall transition into cancelling state, cancel all its children, wait for their completion, and transition into cancelled state.
With this change the behavior of CoroutineExceptionHandler
is going to be more consistent. When a standalone coroutine (the coroutine that is started with launch
) crashes with exception or is cancelled it should transition to cancelling state and invoke CoroutineExceptionHandler
with the context of the parent coroutine. The default behavior is to cancel the parent. Since the crashed child is already in the cancelling state it is never going cause a problem of cancelling the failing child.
Documentation needs to be adjusted appropriately. In particular, the state machine for Deferred
is going to be identical to a Job
and its docs can be simpler and more focused on the Deferred
/Job
differences.
#219 shall be implemented first, since CancellableContinuation
must keep distinction between the state where it was “resumed with exception” and where it was “cancelled”. This is often needed to for a proper resource-managent to avoid closing some external resource when it was already closed and the continuation was cancelled.
Issue Analytics
- State:
- Created 6 years ago
- Reactions:3
- Comments:9 (6 by maintainers)
Top GitHub Comments
We’re back on track. We’ve figured out how to preserve all our channel-related use-cases while abolishing cancelling/failing distinction. Job becomes conceptually so much simpler.
Btw, I’ve tried hard to make it all consistent while still keeping “exceptional completion” distinct from “cancellation”. It just does not click. It becomes a real mess in corner cases like “coroutine that was cancelled, but then crashes with some exception in its finally sections” and “coroutine that has crashed, but was cancelled while waiting for its children to complete”. It becomes much easier to grasp when “exceptional completion” and “cancellation” are simply the same concept.