File descriptor leak caused by clients prematurely closing connections
See original GitHub issueHi! 👋We’ve been using JMX exporter to instrument Cassandra (using the javaagent on version 0.3.1).
We recently had an incident caused by Cassandra running out of file descriptors. We found these had been gradually leaking over time (metric here is node_filefd_allocated
from node_exporters on those instances - the FD limit we set for Cassandra is 100k):
We’d been seeing some issues with Prometheus timing out whilst scraping these nodes, and found that the majority of open FDs were orphaned TCP sockets in CLOSE_WAIT
. Thread dumps showed that all 5 JMX exporter threads on these nodes seemed to be stuck writing to the socket:
"pool-1-thread-1" - Thread t@84
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
- locked <2e084219> (a java.lang.Object)
at sun.net.httpserver.Request$WriteStream.write(Request.java:391)
- locked <368dd754> (a sun.net.httpserver.Request$WriteStream)
at sun.net.httpserver.ChunkedOutputStream.writeChunk(ChunkedOutputStream.java:125)
at sun.net.httpserver.ChunkedOutputStream.write(ChunkedOutputStream.java:87)
at sun.net.httpserver.PlaceholderOutputStream.write(ExchangeImpl.java:444)
at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:253)
at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211)
at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:145)
at java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:167)
- locked <fd8ef5> (a java.io.ByteArrayOutputStream)
at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer$HTTPMetricHandler.handle(HTTPServer.java:74)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- locked <467e300a> (a java.util.concurrent.ThreadPoolExecutor$Worker)
Putting these two bits of information together gives us this theory:
- Prometheus scrapes the node - sends a HTTP request to JMX exporter
- JMX exporter collects metrics, but takes a long time to do so (this is occasionally expected in our case, our nodes export thousands of JMX metrics)
- Prometheus reaches the scrape timeout, and cancels the request with a TCP
FIN
- JMX exporter finishes collecting metrics, and attempts to write the output to the socket (https://github.com/prometheus/client_java/blob/parent-0.3.0/simpleclient_httpserver/src/main/java/io/prometheus/client/exporter/HTTPServer.java#L78). The other side of the TCP connection has been closed.
- This call blocks forever, and we never reach https://github.com/prometheus/client_java/blob/parent-0.3.0/simpleclient_httpserver/src/main/java/io/prometheus/client/exporter/HTTPServer.java#L80 which closes the socket
It looks like simpleclient_httpserver
doesn’t have good semantics around handling closed connections.
We don’t have a minimal reproduction of this, but tcpdumps back this up. We’re considering forking the jmx_exporter to use simpleclient_jetty
instead, but we wondered if anyone else had come across this?
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:25 (9 by maintainers)
@brian-brazil We moved the main jmx exporter out of the Cassandra process to an external HTTP server because we couldn’t afford to have fd leaks and restart Cassandra once in a while. Now we run 2 copies of the JMX exporter with the in process version scraping only the minimal jvm metrics and the external exporter scraping much more detailed Cassandra metrics. Since it’s an external process which gets restarted automatically by systemd if it crashes, we haven’t really looked into it since. Its not an optimal solution, but saved us from the problematic db restarts.
Yes of course - I’ve clarified in the description, thanks.
I don’t have
netstat
output for the affected nodes, though I’ll grab it when the issue reoccurs. We did analyse the leak withlsof
, and found thousands of entries like:hostname-a
is the hostname of the Cassandra node we ran this on, andhostname-b
is one of the Prometheus hosts. All of the connections inCLOSE_WAIT
showed a connection to a Prometheus host and we’re exposing the JMX exporter onhttp-alt
port (8080), so these are definitely connections handled by the JMX exporter.