question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

maxProcessing config parameter in ns instead of ms

See original GitHub issue

When running CometD 7.0.5 in production, CometD was continually increasing its memory usage. From a heap dump, it appears that much of the memory usage was due to old ServerSessionImpl instances holding references to a large number of ServerMessageImpl instances and their related data. I thought that CometD would use the timeout configuration parameter to remove old ServerSessionImpl instances, but it appears that this is not the case. The CometD documentation specifies a maxProcessing setting at https://docs.cometd.org/current/reference/#_java_server_configuration which seems like it would resolve this apparent memory leak: “The maximum period of time, in milliseconds, that the server waits for the processing of a message before considering the session invalid and removing it.”

However, the actual implementation adds the value from maxProcessing setting to a time which is measured in nanoseconds: https://github.com/cometd/cometd/blob/d4536b9c4233e318cf40e1dc7e3cada9f92cfa33/cometd-java/cometd-java-server/cometd-java-server-common/src/main/java/org/cometd/server/ServerSessionImpl.java#L159 https://github.com/cometd/cometd/blob/d4536b9c4233e318cf40e1dc7e3cada9f92cfa33/cometd-java/cometd-java-server/cometd-java-server-common/src/main/java/org/cometd/server/BayeuxServerImpl.java#L1298 As a result the maxProcessing setting would also need to be defined in nanoseconds, not milliseconds as specified in the documentation.

Although ServerSessionImpl uses a long for the _maxProcessing variable, setting the maxProcessing config setting larger than 2^31 nanoseconds causes a NumberFormatException which causes CometD to not work at all.

Tested with:

  • CometD 7.0.5
  • Jetty 11.0.7
  • openjdk 11.0.13

How to reproduce:

Set maxProcessing configuration setting to 2100000000 and verify that sessions are purged every 2.1 seconds:

<init-param>
      <param-name>maxProcessing</param-name>
      <param-value>2100000000</param-value>
    </init-param>

Set maxProcessing configuration setting to 2200000000 and verify that clients are unable to connect at all:

<init-param>
      <param-name>maxProcessing</param-name>
      <param-value>2200000000</param-value>
    </init-param>

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
zaynetrocommented, Feb 15, 2022

We are seeing similar issues after upgrading from 5.0.2 to 5.0.10. The symptoms are similar: after running a server for several days some ServerSessionImpl are not removed. It doesn’t happen to all sessions though. When we stop new incoming connections to the server the amount of server sessions doesn’t drop to zero even after many hours.

Our current way of coping with this is to set a max queue option so that we can clean at least some sessions.

Configuration:

  • Default transports (our clients use mostly websocket)
  • timeout = 60000
  • maxQueue = 250
  • maxInterval = 600000 (10 mins)
  • ws.messagesPerFrame = 2
  • custom jsonContext which uses addConvertor (that is something I forgot to change)
  • Ack extension

Clients run cometd 5.0.10 from npm.

We couldn’t find the root cause and couldn’t replicate the issue ourselves but we have an assumption: clients initiate a connection but then a page reload happens hence leaving session in an unfinished setup. There could be several consequent page reloads.

An example of a session that hit max queue limit despite us setting maxInterval (notice that lastConnected is null)

W|2022-02-15T05:46:17.664Z| =====>>> Session queue maxed for cometdSession 2q4so3ufya9186od1r9q082kgyp7d. 
  Disconnecting cometd session isConnected=false isHandshook=false startedAt=2022-02-15T04:27:33.974207Z lastConnected=null. 
session.setAttribute( "startedAt", OffsetDateTime.now() );
session.addListener( new ServerSession.QueueMaxedListener() {
    @Override
    public boolean queueMaxed( ServerSession session, Queue<ServerMessage> queue, ServerSession sender,
            ServerMessage message ) {
        LOG.warn(
                "=====>>> Session queue maxed for cometdSession {}."
                        + " Disconnecting cometd session isConnected={} isHandshook={} startedAt={} lastConnected={}.",
                cometdSessionId, 
                session.isConnected(), session.isHandshook(),
                session.getAttribute( "startedAt" ), session.getAttribute( "lastConnected" ) );
        session.disconnect();
        return false;
    }
} );

session.addListener( new ServerSession.HeartBeatListener() {
    @Override
    public void onResumed( ServerSession session, ServerMessage message, boolean timeout ) {
        // This is called when server sends /meta/connect to the client
    }

    @Override
    public void onSuspended( ServerSession session, ServerMessage message, long timeout ) {
        // This is called when server receives /meta/connect response from the client
        session.setAttribute( "lastConnected", OffsetDateTime.now() );
    }
} );

We will be interested to test the latest fixes once they are released to maven central.

0reactions
youngjcommented, Feb 18, 2022

In order to test the fixes I would also need a new version to be released to Maven Central.

Read more comments on GitHub >

github_iconTop Results From Across the Web

1420681 – Undertow metrics processing ... - Red Hat Bugzilla
Bug 1420681 - Undertow metrics processing-time, processing-time-per-minute and max-processing time do not show "unit" in the graph.
Read more >
Configuration and Attributes — Manual - NS-3
In ns-3 simulations, there are two main aspects to configuration: The simulation topology and how objects are connected. The values used by the...
Read more >
Solarflare Server Adapter User Guide - Xilinx
3.7 Configuring the Solarflare Adapter . ... 4.12 Adapter Configuration . ... If auto‐negotiation is disabled or fails, the adapter will instead analyze...
Read more >
kafka-cluster: Kafka Trigger - Nuclio
Configuration parameters. Use the following trigger attributes for basic configurations of your Kafka trigger. You can configure each attribute either in the ...
Read more >
Apache Tomcat 10 (10.0.28-dev) - Manager App How-To
Introduction; Configuring Manager Application Access; HTML User-friendly Interface; Supported Manager Commands. Common Parameters; Deploy A ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found