question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Disconnect with sticky-sessions on 4 CPU at 1K connections

See original GitHub issue

This one is a bit frustrating.

hardware: Chat server: 1 EC2 C3.xlarge (4CPU optimized for computing) instance Test servers: 3 EC2 T2.small instances

Setup: 3 Node instance running on 3 CPUs, master Node supposed to run on the 4th CPU. 1 connection every 100 millseconds on each slave node instance

problem: before 1K connections (3K on all CPUs), there are no disconnect events. As soon as reaching around 1K, disconnect even starts mixing half the time with connection event: Something like this on all 3 CPUs:

    Disconnected
    Connected[1035] => 5643_1035
    Connected[1036] => 5643_1036
    Connected[1037] => 5643_1037
    Connected[1038] => 5643_1038
    Connected[1039] => 5643_1039
    Connected[1040] => 5643_1040
    Disconnected
    Connected[1042] => 5643_1042
    Connected[1043] => 5643_1043
    Connected[1044] => 5643_1044
    Connected[185] => 5643_185
    Connected[1045] => 5643_1045

(even if I do client connections slower than this, it still has exact same problem like 1 connection each 300 miliseconds) (even if I reduce 3 test machines to 1 test machine and keep server code exactly the same, disconnect happens around 2500 connections with 1 new connection each 40 miliseconds) (The wierdness comes from the disconnect only start when certain number of clients connect)

mpstat -P ALL 3 gives:

05:33:22 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
05:33:25 PM all 14.16 0.00 1.55 0.00 0.00 0.86 0.17 0.00 83.25
05:33:25 PM 0 12.80 0.00 1.04 0.00 0.00 0.00 0.00 0.00 86.16
05:33:25 PM 1 16.96 0.00 2.08 0.00 0.00 3.81 0.35 0.00 76.82
05:33:25 PM 2 13.49 0.00 1.73 0.00 0.00 0.00 0.35 0.00 84.43
05:33:25 PM 3 12.76 0.00 1.38 0.00 0.00 0.00 0.00 0.00 85.86

My question: 1.Why is the disconnect, CPU is low usage, I have no idea? 2.Connection events goes through master, can that become a bottleneck?

Code: I copy pasted my code directly into sticky-sessions source code:

// Setup basic express server
var async = require('async');
var express = require('express');
var io;
var port = 3001;
var usernames = {};
var sockets = {};
var numUsers = 0;


var net = require('net'),
    cluster = require('cluster');

function hash(ip, seed) {
    var hash = ip.reduce(function(r, num) {
        r += parseInt(num, 10);
        r %= 2147483648;
        r += (r << 10)
        r %= 2147483648;
        r ^= r >> 6;
        return r;
    }, seed);

    hash += hash << 3;
    hash %= 2147483648;
    hash ^= hash >> 11;
    hash += hash << 15;
    hash %= 2147483648;

    return hash >>> 0;
}


var server;

// `num` argument is optional
if (typeof num !== 'number')
    num = require('os').cpus().length - 1;

// Master will spawn `num` workers
if (cluster.isMaster) {
    var workers = [];
    for (var i = 0; i < num; i++) {
        ! function spawn(i) {
            workers[i] = cluster.fork();
            // Restart worker on exit
            workers[i].on('exit', function() {
                console.error('sticky-session: worker died');
                spawn(i);
            });
        }(i);
    }

    var seed = ~~(Math.random() * 1e9);
    server = net.createServer(function(c) {
        // Get int31 hash of ip
        var worker,
            ipHash = hash((c.remoteAddress || '').split(/\./g), seed);
        console.log(c.remoteAddress);
        // Pass connection to worker
        worker = workers[ipHash % workers.length];
        worker.send('sticky-session:connection', c);
    }).listen(port);
} else {
    startServer();
}



function doStickyClusterStuff() {
    // Worker process
    process.on('message', function(msg, socket) {
        if (msg !== 'sticky-session:connection') return;

        server.emit('connection', socket);
    });

    if (!server) throw new Error('Worker hasn\'t created server!');

    // Monkey patch server to do not bind to port
    var oldListen = server.listen;
    server.listen = function listen() {
        var lastArg = arguments[arguments.length - 1];

        if (typeof lastArg === 'function') lastArg();

        return oldListen.call(this, null);
    };
}


function startServer() {
    var app = new express();

    // Don't expose our internal server to the outside.
    server = app.listen(0, 'localhost'),
    io = require('socket.io')(server);

    doStickyClusterStuff();

    console.log("listening to port " + port);
    io.adapter(require('socket.io-redis')({
        host: 'localhost',
        port: 5222
    }));

    io.on('connection', function(socket) {
        console.log("new connection");
        sockets[socket.id] = socket;
        console.log("\nadded soccketId" + socket.id + " socket size " + Object.keys(sockets).length);

        socket.on('disconnect', function() {
            console.log("socket disconnect");
            delete sockets[socket.id];
        });
    });
}

This is my server log if it helps (it seems there are multiple calls for one connection event which is not very efficient):

added soccketIdVEgwmvUfv6UjrgR0AADt socket size 238
10.65.181.163
10.65.181.163
10.65.181.163
10.65.181.163
10.183.165.158
new connection

added soccketIdxjiIYbB-Hjv7VQepAADu socket size 239
10.183.165.158
10.183.165.158
10.230.30.152
new connection

added soccketId0dP0S3IQRi_3cWUzAAB1 socket size 118
10.230.30.152
10.230.30.152
10.183.165.158
10.183.165.158
10.230.30.152
10.230.30.152
10.65.181.163
new connection

added soccketIdurbUpuuZdzlGoCgyAADv socket size 240
10.65.181.163
10.65.181.163
10.65.181.163
10.65.181.163
10.183.165.158
new connection

added soccketIdUCF6UP4HvKouFIMcAADw socket size 241
10.183.165.158
10.183.165.158
10.230.30.152
new connection

added soccketIdqQTukWGwbPbCethaAAB2 socket size 119
10.230.30.152
10.230.30.152
10.183.165.158
10.183.165.158
10.230.30.152
10.230.30.152
10.65.181.163
new connection

added soccketIdCpyl_tKXUtqxcJ_aAADx socket size 242
10.65.181.163
10.65.181.163
10.65.181.163
10.65.181.163
10.183.165.158
new connection

added soccketIds5pWfaEI_AUDM1LCAADy socket size 243
10.183.165.158
10.230.30.152
10.183.165.158
new connection

added soccketIdTTtN83qWGaSoEt9gAAB3 socket size 120
10.230.30.152
10.230.30.152
10.183.165.158
10.183.165.158
10.230.30.152
10.230.30.152
10.65.181.163
new connection

added soccketIdtu0DH4nu5eFGzmJJAADz socket size 244
10.65.181.163
10.65.181.163
10.65.181.163
10.65.181.163
10.183.165.158
new connection

added soccketIdnzrEz83lKxy3VKtbAAD0 socket size 245
10.183.165.158
10.183.165.158
10.230.30.152
new connection

added soccketIdexUmWYfM4YLRvzzBAAB4 socket size 121
10.230.30.152
10.230.30.152
10.183.165.158
10.183.165.158
10.230.30.152
10.230.30.152
10.65.181.163
new connection

added soccketIdKS9xKndlvYWajGH6AAD1 socket size 246
10.65.181.163
10.65.181.163
10.65.181.163
10.65.181.163
10.183.165.158
new connection

added soccketIdobXaIdChCMC41Ee-AAD2 socket size 247
10.183.165.158
10.183.165.158
10.230.30.152
new connection

added soccketId6ui-avz6Hj9s-I0aAAB5 socket size 122
10.230.30.152
10.230.30.152
10.183.165.158
10.183.165.158
10.230.30.152
10.230.30.152

Issue Analytics

  • State:closed
  • Created 9 years ago
  • Comments:10

github_iconTop GitHub Comments

3reactions
sabrehagencommented, Sep 26, 2015

@ynkm169 Are you able to make your code public so we can all share in it’s greatness please?

0reactions
e-m-s-ycommented, May 12, 2016

@ynkm169 How did you test this?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Sticky sessions for your Application Load Balancer
WebSocket connections are inherently sticky. If the client requests a connection upgrade to WebSockets, the target that returns an HTTP 101 status code...
Read more >
Socket.io scaling issue - nginx - Stack Overflow
When you use sticky sessions, each worker will be connected to a single client. So, if you have 4 workers, you will be...
Read more >
Scaling Socket.IO - practical considerations - Ably Realtime
The problem with sticky sessions is that they hinder your ability to scale dynamically. For example, let's assume some of your Socket.
Read more >
HTTP Keepalive Connections and Web Performance - NGINX
If a client opens 8 TCP connections and keeps each alive for 15 seconds after the last use, the client consumes 8 concurrency...
Read more >
25. WebSocket Support - Spring
Spring Framework 4 includes a new spring-websocket module with comprehensive ... e.g. for authentication purposes or clustering with sticky sessions.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found