Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RSS keeps growing untill a server crash

See original GitHub issue

I’ve been triaging this bug for a while now. I’ve been testing with various setups and been stripping down to exclude possible causes. This is what I’ve come up with:

Client - socket.io 1.3.7

var socket = io('http://<my Socket.IO host name>.elasticbeanstalk.com:80', {
  reconnection: false,
  transports: ['websocket'],
  upgrade: false
});

Not reconnecting is acceptable for our use case and I’ve wanted to reduce hits to the server
I’ve chose to skip long-polling, since browser support for WebSockets is acceptable for our use case and while researching, there have been claims that it could cause connection-related issues

Server - socket.io 1.3.7

require("appdynamics").profile(<my AppDynamics params, using SSL>);
var server = require('http').createServer(),
  io = require('socket.io')(server, {
    serveClient: false,
    origins: "http://www.<my web host name>.com:*",
    transports: ['websocket'],
    allowUpgrades: false
  });
io.on('connection', function(socket) {
  socket.on('disconnect',function() {});
});
server.listen(process.env.PORT);

This is not a sample, it is the actual server.js running right now I’ve been stripping down my functions one by one until I’ve come down to this, Socket.IO boilerplate code.

The reason for this is that by researching this and reading related issues, much of the advice found was blaming badly user-written code for making bad variable scope decisions, leaking memory etc.

This way, I’m pretty sure that none of the code I wrote caused this issue (since this still happens).

To explain options:

I turned off the client serving and I’m serving client JS from our web host, to reduce hits to the Socket.IO server
As mentioned, I’ve chose not to use long-polling, to exclude it from issues

Hardware

Amazon Elastic Beanstalk NodeJS SaaS
single m1.medium instance with 3.75 GB RAM
running without a proxy (to exclude that as the issue as well)
the swapping is turned off (to exclude that as the issue)
NodeJS 0.12.6

Traffic

about 1M connections per day, but it never gets that far, usually crashes around 0.5M
as you saw before, I’m not emitting any messages
max number of concurrent users is aproximately 4000

Symptoms

with new connections RSS (resident set size) keeps rising immediately. Usually, the rate is 400-500 MB per hour
with no new connections, the RSS just stays the same, it does not drop
when the RSS hits the hardware limit (3.4 GB), the NodeJS process crashes
before I had turned the swapping off, after using all of the RAM, the process started swapping, the app started to lag and drop connections
Contrary to RSS, the heap remains more or less stable, 100-300MB, hinting the issue might be buffer-related

Monitoring

using AppDynamics NodeJS agent (note: this issue have also been appearing when I had NewRelic agent instead and even when I had no monitoring agent and only monitored the server stats from Munin)
This is a screenshot from this morning. The drop from 9:46 is a NodeJS restart. Hint: The increasing number of connections is questionable, since Google Analytics reports lower and more stable numbers.

Issue Analytics

State:
Created 8 years ago
Comments:7 (2 by maintainers)

Top GitHub Comments

1reaction

komorebi-sancommented, Nov 2, 2020

@nebkam I tracked it and realised that its because of poor internet connectivity at the client side, the memory usage will goes up quickly. I am hoping that socket.io would have an option to periodically check for the status of client’s connectivity and closes them.

Thanks for the library recommendation. Will definitely look into it.

0reactions

nebkamcommented, Oct 29, 2020

@komorebi-san After experimenting with different adapters for rooms (for example - MongoDB instead of memory), I noticed a noticeable amount of sockets not being cleaned up, they stayed on the server, taking up resources. Back then, I traced it to some older browser not properly disconnecting the socket on browser close. So, I chose to implement the feature without support for older browsers and replaced my Socket.IO code with low-level ws library and made my custom “room” logic. That server is in production to this day, serving 16M pageviews/month.