Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC: Retryable Tasks

See original GitHub issue

Start Date: 2017-11-04
RFC PR: (leave this empty)
ember-concurrency issue: (leave this empty)

Summary

Sometimes networks are bad. Sometimes computers do bad things. Sometimes solar flares happen. Failure happens. EC already has some great derived state around failure and allows task authors to handle failure as they see fit. This is great for fatal errors where we just want to log the error, maybe show a message to user, and maybe even give them an option to retry the operation. But what about intermittent failure? What about routine, boring failures? The failures we might be able to recover from without user intervention if we just wait a bit and then retry? This RFC proposes adding additional functionality to allow for automatic retries of tasks.

Motivation

As long as web apps need the network, network errors will exist. Sometimes they’re blips. In the case of mobile internet, intermittent network errors are just part of the reality of such distributed connectivity. There are use cases where user intervention should not be needed and the user may not even need to know that something has gone wrong underneath.

Some use-cases for automatic retrying of tasks:

Long-polling tasks
- Background auto-saves
Telemetry/analytics
Integrations w/ 3rd party APIs that don’t quite have five 9s (or three)
Applications that deal with periodic resource contention
- e.g. app talks to a cloud storage API in files might be locked for a few seconds while some other system is working on it.
Anything async that is prone to transient errors
- e.g. transient Geolocation API failures while user is in transit

Consider one approach to solving this now:

import Ember from 'ember';
import { task } from 'ember-concurrency';
import { getWidgets } from '../utils/widgets';

const { Component, run: { later } } = Ember;
const RETRY_DELAYS = [500, 3000, 30000, 50000];

export default Component.extend({
    _retryAttempt: 0,

    fetchLatest: task(function* () {
        try {
            // Call to unreliable 3rd party API
            const data = yield getWidgets();
            this.set('_retryAttempt', 0);
            return data;
        } catch (e) {
            // Ewwww...
            const retryAttempt = this.get('_retryAttempt');
            
            if (retryAttempt < RETRY_DELAYS.length) {
                later(this, () => {
                    this.get('fetchLatest').perform();
                }, RETRY_DELAYS[retryAttempt]);
                this.incrementProperty('_retryAttempt');
            }

            throw e; 
        } 
    }).enqueue()
});

We’re adding more state that we have to remember to reset and such once a retry is successful. We’re also adding additional boilerplate around the error handling.

Detailed design

Scheduler additions

The scheduler includes a concept of buffer policies, which underpin the various property modifiers like .drop(), .enqueue(), etc. and dictate how the scheduler should control the scheduling of additional task instances.

Based on this pattern, this RFC proposes adding a concept of retry policies allowing for additional property modifiers for adding configurable retry behavior a la the resque-retry gem.

Proposed public API additions

e.g. use from an app developer perspective:

import Ember from 'ember';
import { task } from 'ember-concurrency';
import DS from 'ember-data'; 

const { AdapterError } = DS; 

// ... other imports

export default Component.extend({
    backgroundTask: task(function* () {
        // If this fails, it will be retired once after 5s.
        yield someUnreliableAsyncThing(); 
    }).drop().retryable({ delay: 5000 }),
    
    backgroundTaskWithRetryReasons: task(function* () {
        // If this fails and the reason is a
        // SomeError or AdapterError, it will be retried.
        yield someUnreliableAsyncThing(); 
    }).drop().retryable({ reasons: [SomeError, AdapterError] }),
    
    backgroundTaskWithExponentialBackoff: task(function* () {
        // If this fails, it will be retried after
        // 2.5s, 10s, 50s, and 3m.
        yield someUnreliableAsyncThing();
    }).drop().retryable({ delay: [2500, 10000, 500000, 180000] })
});

As shown in the example, the proposed public API addition would be a .retryable task property modifier that would attach a retry policy to the task.

An interface definition for a retry policy:

interface RetryPolicy {
    delay: number | Array<number>;
    reasons: Array<string> | Array<Error>; // maybe just Array<any>?
}

Perhaps, without arguments .retryable() might have a default delay of [500, 1500, 3000, 6000] or something considered sensible.

Additionally, it might be nice to add some derived state. A useful piece of derived state might be isRetrying, which would complement isRunning, but explicitly indicate whether the task was retrying, rather than simply running normally. The user may want to know something is being retried and such a property would make it easier to for application developers to present separate messaging, for example. In addition, a retryCount read-only property could be added to indicate the number of times a task has been retried.

Interactions with buffer policies (& other property modifiers)

Retry policies should work alongside existing property modifiers such as .drop(), .keepLatest(), etc. and work alongside them in expected ways. Tasks retried automatically via .retryable() should be scheduled according to any attached buffer policy modifiers, so that Tasks continue to work as users expect.

When the retry timer is up:

Tasks with .drop() should drop the attempt to retry the task if maxConcurrency has been reached.
Tasks with .keepLatest() should continue the latest currently running task (if any) and enqueue the retry.
Tasks with .restartable() should cancel any running tasks and continue with the retry.
Tasks with .enqueue() should add the retry to the queue
Tasks without other modifiers should begin executing the retry.

Like .drop() and others, .retryable() should not be able to be used with .group().

How We Teach This

We could document it similar to how other property modifiers are documented. i.e. an animated, interactive example showing a visual representation of the retry behavior. Everyone loves animation.

Drawbacks

It adds additional complexity to the scheduler and adds another property modifier that needs to be documented, tested, and supported.

Alternatives

The primary alternative would be continue to allow application developers to build such behavior themselves (possibly using EC tasks!).

Retryable tasks could also be implemented as a task wrapper by wrapping the task computed property or by providing some reusable pattern via encapsulated tasks.

In some cases, application authors can also use Service Workers to handle network failures and return cached responses or enqueue failures for later retry, effectively masking the failure from the application. However, Service Workers come with their own set of complexities & limitations, and do not address non-network use-cases.

Unresolved questions

Does this belong in EC or an add-on?
- If add-on, probably not be possible via public API if going for the wrapper approach. For example, the wrapper around task may require access to taskFn to wrap the generator function.

Issue Analytics

State:
Created 6 years ago
Reactions:8
Comments:8

Top GitHub Comments

3reactions

maxfierkecommented, Mar 19, 2018

Finally got around to working this into an addon called ember-concurrency-retryable based on the approach suggested.

Haven’t worked in any derivable state (not sure if it’s possible yet via an addon), but the important behavior is there, and it’s built on e-c’s timeout and generators, so it should play nicely with cancellation.

Going to close out the RFC for now

1reaction

maxfierkecommented, Nov 20, 2017

Nope, I don’t think so. I was mostly curious about the why rather than the how. It sounds like your concern is with scope creep of E-C, which is totally valid 👍 (though, I will let @machty make the determination of what is/isn’t within scope).

Not so much a blocker with your approach to the add-on, but the reliance on a private module import to add the property modifier to the TaskProperty prototype gives me pause, which is why I suggested a more modest addition to core (relative to the original RFC) of a hook into TaskInstance. (but, I may be getting ahead of myself even more, suggesting hooks into E-C; topic for a different RFC, maybe)

I appreciate the feedback! I’ll play around with the add-on approach a bit 😎