question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC: Retryable Tasks

See original GitHub issue
  • Start Date: 2017-11-04
  • RFC PR: (leave this empty)
  • ember-concurrency issue: (leave this empty)

Summary

Sometimes networks are bad. Sometimes computers do bad things. Sometimes solar flares happen. Failure happens. EC already has some great derived state around failure and allows task authors to handle failure as they see fit. This is great for fatal errors where we just want to log the error, maybe show a message to user, and maybe even give them an option to retry the operation. But what about intermittent failure? What about routine, boring failures? The failures we might be able to recover from without user intervention if we just wait a bit and then retry? This RFC proposes adding additional functionality to allow for automatic retries of tasks.

Motivation

As long as web apps need the network, network errors will exist. Sometimes they’re blips. In the case of mobile internet, intermittent network errors are just part of the reality of such distributed connectivity. There are use cases where user intervention should not be needed and the user may not even need to know that something has gone wrong underneath.

Some use-cases for automatic retrying of tasks:

  • Long-polling tasks
    • Background auto-saves
  • Telemetry/analytics
  • Integrations w/ 3rd party APIs that don’t quite have five 9s (or three)
  • Applications that deal with periodic resource contention
    • e.g. app talks to a cloud storage API in files might be locked for a few seconds while some other system is working on it.
  • Anything async that is prone to transient errors
    • e.g. transient Geolocation API failures while user is in transit

Consider one approach to solving this now:

import Ember from 'ember';
import { task } from 'ember-concurrency';
import { getWidgets } from '../utils/widgets';

const { Component, run: { later } } = Ember;
const RETRY_DELAYS = [500, 3000, 30000, 50000];

export default Component.extend({
    _retryAttempt: 0,

    fetchLatest: task(function* () {
        try {
            // Call to unreliable 3rd party API
            const data = yield getWidgets();
            this.set('_retryAttempt', 0);
            return data;
        } catch (e) {
            // Ewwww...
            const retryAttempt = this.get('_retryAttempt');
            
            if (retryAttempt < RETRY_DELAYS.length) {
                later(this, () => {
                    this.get('fetchLatest').perform();
                }, RETRY_DELAYS[retryAttempt]);
                this.incrementProperty('_retryAttempt');
            }

            throw e; 
        } 
    }).enqueue()
});

We’re adding more state that we have to remember to reset and such once a retry is successful. We’re also adding additional boilerplate around the error handling.

Detailed design

Scheduler additions

The scheduler includes a concept of buffer policies, which underpin the various property modifiers like .drop(), .enqueue(), etc. and dictate how the scheduler should control the scheduling of additional task instances.

Based on this pattern, this RFC proposes adding a concept of retry policies allowing for additional property modifiers for adding configurable retry behavior a la the resque-retry gem.

Proposed public API additions

e.g. use from an app developer perspective:

import Ember from 'ember';
import { task } from 'ember-concurrency';
import DS from 'ember-data'; 

const { AdapterError } = DS; 

// ... other imports

export default Component.extend({
    backgroundTask: task(function* () {
        // If this fails, it will be retired once after 5s.
        yield someUnreliableAsyncThing(); 
    }).drop().retryable({ delay: 5000 }),
    
    backgroundTaskWithRetryReasons: task(function* () {
        // If this fails and the reason is a
        // SomeError or AdapterError, it will be retried.
        yield someUnreliableAsyncThing(); 
    }).drop().retryable({ reasons: [SomeError, AdapterError] }),
    
    backgroundTaskWithExponentialBackoff: task(function* () {
        // If this fails, it will be retried after
        // 2.5s, 10s, 50s, and 3m.
        yield someUnreliableAsyncThing();
    }).drop().retryable({ delay: [2500, 10000, 500000, 180000] })
}); 

As shown in the example, the proposed public API addition would be a .retryable task property modifier that would attach a retry policy to the task.

An interface definition for a retry policy:

interface RetryPolicy {
    delay: number | Array<number>;
    reasons: Array<string> | Array<Error>; // maybe just Array<any>?
}

Perhaps, without arguments .retryable() might have a default delay of [500, 1500, 3000, 6000] or something considered sensible.

Additionally, it might be nice to add some derived state. A useful piece of derived state might be isRetrying, which would complement isRunning, but explicitly indicate whether the task was retrying, rather than simply running normally. The user may want to know something is being retried and such a property would make it easier to for application developers to present separate messaging, for example. In addition, a retryCount read-only property could be added to indicate the number of times a task has been retried.

Interactions with buffer policies (& other property modifiers)

Retry policies should work alongside existing property modifiers such as .drop(), .keepLatest(), etc. and work alongside them in expected ways. Tasks retried automatically via .retryable() should be scheduled according to any attached buffer policy modifiers, so that Tasks continue to work as users expect.

When the retry timer is up:

  • Tasks with .drop() should drop the attempt to retry the task if maxConcurrency has been reached.
  • Tasks with .keepLatest() should continue the latest currently running task (if any) and enqueue the retry.
  • Tasks with .restartable() should cancel any running tasks and continue with the retry.
  • Tasks with .enqueue() should add the retry to the queue
  • Tasks without other modifiers should begin executing the retry.

Like .drop() and others, .retryable() should not be able to be used with .group().

How We Teach This

We could document it similar to how other property modifiers are documented. i.e. an animated, interactive example showing a visual representation of the retry behavior. Everyone loves animation.

Drawbacks

It adds additional complexity to the scheduler and adds another property modifier that needs to be documented, tested, and supported.

Alternatives

The primary alternative would be continue to allow application developers to build such behavior themselves (possibly using EC tasks!).

Retryable tasks could also be implemented as a task wrapper by wrapping the task computed property or by providing some reusable pattern via encapsulated tasks.

In some cases, application authors can also use Service Workers to handle network failures and return cached responses or enqueue failures for later retry, effectively masking the failure from the application. However, Service Workers come with their own set of complexities & limitations, and do not address non-network use-cases.

Unresolved questions

  • Does this belong in EC or an add-on?
    • If add-on, probably not be possible via public API if going for the wrapper approach. For example, the wrapper around task may require access to taskFn to wrap the generator function.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:8
  • Comments:8

github_iconTop GitHub Comments

3reactions
maxfierkecommented, Mar 19, 2018

Finally got around to working this into an addon called ember-concurrency-retryable based on the approach suggested.

Haven’t worked in any derivable state (not sure if it’s possible yet via an addon), but the important behavior is there, and it’s built on e-c’s timeout and generators, so it should play nicely with cancellation.

Going to close out the RFC for now

1reaction
maxfierkecommented, Nov 20, 2017

Nope, I don’t think so. I was mostly curious about the why rather than the how. It sounds like your concern is with scope creep of E-C, which is totally valid 👍 (though, I will let @machty make the determination of what is/isn’t within scope).

Not so much a blocker with your approach to the add-on, but the reliance on a private module import to add the property modifier to the TaskProperty prototype gives me pause, which is why I suggested a more modest addition to core (relative to the original RFC) of a hook into TaskInstance. (but, I may be getting ahead of myself even more, suggesting hooks into E-C; topic for a different RFC, maybe)

I appreciate the feedback! I’ll play around with the add-on approach a bit 😎

Read more comments on GitHub >

github_iconTop Results From Across the Web

⚓ T97204 RFC: Request timeouts and retries
When reaching the request timeout in a server, all request-associated resources are released and a response with a 503 status code is sent....
Read more >
RFC 7231: Hypertext Transfer Protocol (HTTP/1.1)
1. GET The GET method requests transfer of a current selected representation for the target resource. · 2. HEAD The HEAD method is...
Read more >
RFC 2616 HTTP/1.1
An implementation that satisfies all the MUST or REQUIRED level and all the SHOULD level requirements for its protocols is said to be...
Read more >
maxfierke/ember-concurrency-retryable
An Ember addon that adds retry strategies and a task modifier for automatically retrying ember-concurrency tasks.
Read more >
tRFC retry in R/3
We have already set Retry option in the RFC destination. tRFC Options ->. Connection attempts up to task 120. Time between 2 retries...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found