Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Discussion: Sequence ID validation race condition when requesting snapshot via REST

See original GitHub issue

For exchanges that request the l2 snapshot over REST there is a pretty common race condition issue that can lead to missing order book data.

The problem w/example

I’ll use Bittrex as an example since the issue happens often for me with Bittrex. The culprit code looks like this:

 _sendSubLevel2Updates(remote_id, market) {
    this._requestLevel2Snapshot(market);
    this._wss.send(
      JSON.stringify({
        H: "c3",
        M: "Subscribe",
        A: [[`orderbook_${remote_id}_${this.orderBookDepth}`]],
        I: ++this._messageId,
      })
    );
  }

The issue is as follows:

I call subscribeLevel2Updates()
First it makes a REST request to get the snapshot via this._requestLevel2Snapshot(market)
While that REST request is sent, the ws "Subscribe" event is sent to subscribe to l2 updates.
The REST call gets its response first, and emits a l2snapshot event with sequenceId of “100”.
The ws stream for l2 updates starts arriving, with the first event having a sequenceId of “105”.

The snapshot has sequenceId of “100” while the first update has a sequenceId of “105”. This means you missed four updates and your order book will never be valid.

Solutions

Subscribing to l2 updates first and waiting to receive the first l2 update before requesting the snapshot over REST would work and I believe this may be the best backwards-compatible fix. You would have to ignore any of the first few l2 updates that arrived w/a sequenceId before the snapshot’s sequenceId, but this issue already exists w/snapshots over REST in the current implementation. I imagine something like this below (not run or tested):

 _sendSubLevel2Updates(remote_id, market) {
   let hasRequestedSnapshot = false;
   this.on('l2update', (data, updateMarket) => {
      if (hasRequestedSnapshot) return;
      // when first l2update arrives, request the snapshot so we know it will be at a point in time
      // that is >= when the first l2 update was received, to avoid a gap between snapshot and first l2 update
      if (updateMarket.id === market.id) {
        hasRequestedSnapshot = true;
        this._requestLevel2Snapshot(market);
      }
    });
    this._wss.send(
      JSON.stringify({
        H: "c3",
        M: "Subscribe",
        A: [[`orderbook_${remote_id}_${this.orderBookDepth}`]],
        I: ++this._messageId,
      })
    );
  }

As a solution outside of this library, the code using ccxws could just detect this scenario and call client._requestLevel2Snapshot(market) again to get the newer snapshot, then ignore the updates that arrive before the newer snapshot’s sequenceId. This will work and is what I’m doing right now, but I’m wondering if there’s a better way to handle this in a generic way.
If there is no solution then I think this scenario should at least be documented prominently somewhere.

Issue Analytics

State:
Created 3 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

evan-coygocommented, Jan 14, 2021

I’ll be honest, I didn’t know that was a thing. Great idea though to avoid clutter in Issues. Discussion created here https://github.com/altangent/ccxws/discussions/255, I’m closing this issue

1reaction

bmancini55commented, Jan 14, 2021

@evan-coygo, thanks for submitting the issue! Yeah, this is sort of a known issue that’s shows its ugly head when you try to build certain books.

To rehash, the issue is that the socket stream starts after the sequenceId returned by the REST API. In many cases I think this is due to request caching of the REST API. So you end up with a slightly stale snapshot (lower sequenceId) than you have obtained via the socket stream. It could also just be that establishing socket subscriptions takes longer than the REST request for some exchanges. I digress on the cause.

From a code perspective, to avoid this issue, you need to perform the snapshot after the subscription has been confirmed and you’ve queued a sufficient number of messages (intentionally ambiguous as the sufficient depth is likely exchange and latency specific).

In the past I’ve recommend monkey patching a timestamp delay over _requestLevel2Snapshot(market) for the client:

const REST_DELAY_MS = 500
client._originalRequestLevel2Snapshot = client._requestLevel2Snapshot;
client._requestLevel2Snapshot = market => setTimeout(() => client._originalRequestLevel2Snapshot(market), REST_DELAY_MS);

However, this solution isn’t foolproof and may introduce some funkiness on reconnections.

Ideally (as usually), changes after the refactor (#149) which will include subscription success/failure (#103) combined with firing the snapshot request after a delay after subscription success would be best.

As a fallback though, I think you need to be prepared to call _requestLevel2Snapshot if you receive a snapshot older than your update stream. Which totally should certainly be documented!