WebSocket connection retry handling is severely broken in Firefox
See original GitHub issueSetup:
OS: Mac Monterey v12.4 Firefox: 106.0.5 (64-bit) NATS Server: v2.9.6 nats.ws: v1.10.0,
# server.conf
websocket {
port: 9222,
no_tls: true
}
// Web app
conn = await connect({
servers: config.URL,
maxReconnectAttempts: -1, // Retry forever
waitOnFirstConnect: true
});
Steps to reproduce:
- Start NATS server with WebSockets enabled.
- Open your web app in Firefox. The app connects to the NATS server’s WebSocket endpoint.
- Everything is fine.
- Stop the NATS server for 3 minutes or so.
- Start the NATS server again.
- After a while, the web app connects to NATS server by establishing several WebSocket connections!
- There should be only one connection, instead of 7 or so.
This happens only in Firefox. Chrome and Safari work without problems. I tried both localhost and non-localhost setups.
This is most probably related to this Firefox behavior (see the accepted answer):
Found it. This is intentional behavior to comply with RFC 6455. Per this patch, it uses an exponential backoff up to 60 seconds max. Unfortunately I think this means auto-reconnecting to a websocket endpoint in Firefox is broken.
- https://stackoverflow.com/questions/59548618/firefox-doesnt-close-websocket-immediately-on-connection-error
- https://bugzilla.mozilla.org/show_bug.cgi?id=711793
I took some screenshots from Chrome (left) and Firefox (right) where this exponential backoff is clearly visible:
Here you see how multiple connections (should be only one) are established after restarting the NATS server. You can see the connections in NATS server’s http://localhost:8222/connz endpoint too, or by running nats server report connections
from command line.
Issue Analytics
- State:
- Created 10 months ago
- Comments:12 (9 by maintainers)
Top GitHub Comments
Yes I tried a number of things, and even pre-cancelling (which actually happens now) is not honored. Annotating that the WebSocket was cancelled also has no effect (tried to dispose it on create). The current workaround is what you have. While the Firefox folks are doing the right thing by the RFC, the bigger issue is that the client is not able to understand that this is the case or the throttling that is being imposed. One thought I had was to make a version of the reconnect jitter (which takes a function) that implements a similar exponential backoff. This is on my list to do.
@heikkilamarko you are absolutely right - if you wait a very long amount of time, the behavior is completely different.
The crazy thing is that, my handler can detect that it was closed - but it’s like the API calls have no effect.