question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Streaming-Core] frequent websocket disconnects

See original GitHub issue

Hi all,

I have spent some time tracking (and attempting to improve) websocket connection reliability in XChange and wanted to detail what I have so far for anyone who may be experiencing the same thing (and anyone that may have time to also work on improving this).

TLDR; I am seeing websocket disconnects anywhere from hourly to every 3 hours to up to about 12 hours. This is a lot, a conversation with Kroitor @ CCXT made me aware of the fact that they can go several days without experiencing a disconnect (for CoinbasePro/Kraken). I suspect these disconnects are self inflicted.

We have an IdleReadTimeout (how long we go without reading anything over the websocket) which will trigger us to disconnect if we don’t see anything for 15 seconds). I had added https://github.com/knowm/XChange/commit/3010143ce0d487756943d6178aefbe865ff91ef4 to send a ping because channels which weren’t busy would disconnect every few minutes.

This improved the disconnect rate for me from every few minutes to every few hours but is still not what it should be. True websocket disconnects tend to be a little tricky to figure out correctly because if a server process dies without notifying the client of a shutdown/disconnect hook the client has to figure on its own if the channel is dead and this is a bit empirical, things like a long GC pause on the server size can make the client think the channel is dead but may not actually be, so read timeouts tend to be the best efforts.

In any case out of the disconnect’s I see very rarely it is the server sending a close channel message which would print this log line LOG.info("WebSocket Client received closing! {}", ctx.channel());. Most disconnects just tend to be what appears to be an inactive channel.

In the current stage I have modified the IdleStateHandler with additional logging to see what the stacktrace is before a channel has inactive methods called. I have attached said stack_trace.log. What is logged as read message is basically us reading a message over the websocket channel, you see there is a unusual gap before we start triggering idle code. However this gap is much smaller then the idle timeout. TempIdleStateHandler is what I’ve replaced IdleStateHandler in the NettyStreamingService class with.

I still have not figured out what the source of this issue is but hopefully this can be useful to anyone who has noticed this.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:6
  • Comments:17 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
earcecommented, Dec 5, 2020

Another update, after opening an issue with the Netty folks it seems like this is getting triggered cause the websocket is receiving an EOF https://github.com/netty/netty/issues/10830.

2reactions
earcecommented, Nov 30, 2020

Small update I have tracked down what is triggering the channel closing in the Netty code. AbstractNioByteChannel.java set to close which triggers closeOnRead(pipeline). This appears like the websocket is entering a half open state but will update when I have more.

Read more comments on GitHub >

github_iconTop Results From Across the Web

WebSocket keeps disconnecting between several time interval
After a long research we found that the webSocket on the server side produces error on some interval (say 10min idle time) and...
Read more >
Server-side Blazor: Frequent short WebSocket disconnects
Hi, I'm working on a server-side Blazor application. Running this application I get disconnects frequently. This happens during debugging on ...
Read more >
ESP8266 SocketIOclient constant disconnect with Python ...
I am writing a program that echos back the message sent to the web socket server on a host machine. When ...
Read more >
Efficiency and Performance of WebSockets and Server-Sent ...
Mobile devices are inherently restricted due to their limited battery power and require frequent charging. Therefore, experimentation that ...
Read more >
WebSocket — Godot Engine (stable) documentation in English
Godot supports WebSocket in both native and HTML5 exports. ... the remote peer before closing the socket. print("Client %d disconnected, clean: %s" %...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found