RFC: Retryable Tasks
See original GitHub issue- Start Date: 2017-11-04
- RFC PR: (leave this empty)
- ember-concurrency issue: (leave this empty)
Summary
Sometimes networks are bad. Sometimes computers do bad things. Sometimes solar flares happen. Failure happens. EC already has some great derived state around failure and allows task authors to handle failure as they see fit. This is great for fatal errors where we just want to log the error, maybe show a message to user, and maybe even give them an option to retry the operation. But what about intermittent failure? What about routine, boring failures? The failures we might be able to recover from without user intervention if we just wait a bit and then retry? This RFC proposes adding additional functionality to allow for automatic retries of tasks.
Motivation
As long as web apps need the network, network errors will exist. Sometimes they’re blips. In the case of mobile internet, intermittent network errors are just part of the reality of such distributed connectivity. There are use cases where user intervention should not be needed and the user may not even need to know that something has gone wrong underneath.
Some use-cases for automatic retrying of tasks:
- Long-polling tasks
- Background auto-saves
- Telemetry/analytics
- Integrations w/ 3rd party APIs that don’t quite have five 9s (or three)
- Applications that deal with periodic resource contention
- e.g. app talks to a cloud storage API in files might be locked for a few seconds while some other system is working on it.
- Anything async that is prone to transient errors
- e.g. transient Geolocation API failures while user is in transit
Consider one approach to solving this now:
import Ember from 'ember';
import { task } from 'ember-concurrency';
import { getWidgets } from '../utils/widgets';
const { Component, run: { later } } = Ember;
const RETRY_DELAYS = [500, 3000, 30000, 50000];
export default Component.extend({
_retryAttempt: 0,
fetchLatest: task(function* () {
try {
// Call to unreliable 3rd party API
const data = yield getWidgets();
this.set('_retryAttempt', 0);
return data;
} catch (e) {
// Ewwww...
const retryAttempt = this.get('_retryAttempt');
if (retryAttempt < RETRY_DELAYS.length) {
later(this, () => {
this.get('fetchLatest').perform();
}, RETRY_DELAYS[retryAttempt]);
this.incrementProperty('_retryAttempt');
}
throw e;
}
}).enqueue()
});
We’re adding more state that we have to remember to reset and such once a retry is successful. We’re also adding additional boilerplate around the error handling.
Detailed design
Scheduler additions
The scheduler includes a concept of buffer policies, which underpin the various property modifiers like .drop()
, .enqueue()
, etc. and dictate how the scheduler should control the scheduling of additional task instances.
Based on this pattern, this RFC proposes adding a concept of retry policies allowing for additional property modifiers for adding configurable retry behavior a la the resque-retry
gem.
Proposed public API additions
e.g. use from an app developer perspective:
import Ember from 'ember';
import { task } from 'ember-concurrency';
import DS from 'ember-data';
const { AdapterError } = DS;
// ... other imports
export default Component.extend({
backgroundTask: task(function* () {
// If this fails, it will be retired once after 5s.
yield someUnreliableAsyncThing();
}).drop().retryable({ delay: 5000 }),
backgroundTaskWithRetryReasons: task(function* () {
// If this fails and the reason is a
// SomeError or AdapterError, it will be retried.
yield someUnreliableAsyncThing();
}).drop().retryable({ reasons: [SomeError, AdapterError] }),
backgroundTaskWithExponentialBackoff: task(function* () {
// If this fails, it will be retried after
// 2.5s, 10s, 50s, and 3m.
yield someUnreliableAsyncThing();
}).drop().retryable({ delay: [2500, 10000, 500000, 180000] })
});
As shown in the example, the proposed public API addition would be a .retryable
task property modifier that would attach a retry policy to the task.
An interface definition for a retry policy:
interface RetryPolicy {
delay: number | Array<number>;
reasons: Array<string> | Array<Error>; // maybe just Array<any>?
}
Perhaps, without arguments .retryable()
might have a default delay
of [500, 1500, 3000, 6000]
or something considered sensible.
Additionally, it might be nice to add some derived state. A useful piece of derived state might be isRetrying
, which would complement isRunning
, but explicitly indicate whether the task was retrying, rather than simply running normally. The user may want to know something is being retried and such a property would make it easier to for application developers to present separate messaging, for example. In addition, a retryCount
read-only property could be added to indicate the number of times a task has been retried.
Interactions with buffer policies (& other property modifiers)
Retry policies should work alongside existing property modifiers such as .drop()
, .keepLatest()
, etc. and work alongside them in expected ways. Tasks retried automatically via .retryable()
should be scheduled according to any attached buffer policy modifiers, so that Tasks continue to work as users expect.
When the retry timer is up:
- Tasks with
.drop()
should drop the attempt to retry the task ifmaxConcurrency
has been reached. - Tasks with
.keepLatest()
should continue the latest currently running task (if any) and enqueue the retry. - Tasks with
.restartable()
should cancel any running tasks and continue with the retry. - Tasks with
.enqueue()
should add the retry to the queue - Tasks without other modifiers should begin executing the retry.
Like .drop()
and others, .retryable()
should not be able to be used with .group()
.
How We Teach This
We could document it similar to how other property modifiers are documented. i.e. an animated, interactive example showing a visual representation of the retry behavior. Everyone loves animation.
Drawbacks
It adds additional complexity to the scheduler and adds another property modifier that needs to be documented, tested, and supported.
Alternatives
The primary alternative would be continue to allow application developers to build such behavior themselves (possibly using EC tasks!).
Retryable tasks could also be implemented as a task wrapper by wrapping the task computed property or by providing some reusable pattern via encapsulated tasks.
In some cases, application authors can also use Service Workers to handle network failures and return cached responses or enqueue failures for later retry, effectively masking the failure from the application. However, Service Workers come with their own set of complexities & limitations, and do not address non-network use-cases.
Unresolved questions
- Does this belong in EC or an add-on?
- If add-on, probably not be possible via public API if going for the wrapper approach. For example, the wrapper around
task
may require access totaskFn
to wrap the generator function.
- If add-on, probably not be possible via public API if going for the wrapper approach. For example, the wrapper around
Issue Analytics
- State:
- Created 6 years ago
- Reactions:8
- Comments:8
Top GitHub Comments
Finally got around to working this into an addon called ember-concurrency-retryable based on the approach suggested.
Haven’t worked in any derivable state (not sure if it’s possible yet via an addon), but the important behavior is there, and it’s built on e-c’s
timeout
and generators, so it should play nicely with cancellation.Going to close out the RFC for now
Nope, I don’t think so. I was mostly curious about the why rather than the how. It sounds like your concern is with scope creep of E-C, which is totally valid 👍 (though, I will let @machty make the determination of what is/isn’t within scope).
Not so much a blocker with your approach to the add-on, but the reliance on a private module import to add the property modifier to the
TaskProperty
prototype gives me pause, which is why I suggested a more modest addition to core (relative to the original RFC) of a hook intoTaskInstance
. (but, I may be getting ahead of myself even more, suggesting hooks into E-C; topic for a different RFC, maybe)I appreciate the feedback! I’ll play around with the add-on approach a bit 😎