question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Client doesn't reconnect with TLS

See original GitHub issue

My client code:

final ConnectionFactory cf = new ConnectionFactory(natsAddresses);
cf.setMaxReconnect(-1); // reconnect forever

// set callbacks
cf.setClosedCallback(event -> log.warn("NATS connection was closed."));
cf.setDisconnectedCallback(event -> log.warn("NATS disconnected."));
cf.setReconnectedCallback(event -> log.error("NATS reconnected to {}.", event.getConnection().getConnectedUrl()));

// prepare encryption
try {
    final InputStream truststore = resourceLoader.getResource("file:".concat(config.getNats().getTlsTruststore())).getInputStream();
    final KeyStore ks = KeyStore.getInstance(KeyStore.getDefaultType());
    ks.load(truststore, config.getNats().getTlsTruststorePassword().toCharArray());
    final TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
    tmf.init(ks);
    final SSLContext sslContext = SSLContext.getInstance(Constants.DEFAULT_SSL_PROTOCOL);
                sslContext.init(null, tmf.getTrustManagers(), new SecureRandom());
    cf.setSecure(true);
    cf.setSSLContext(sslContext);
} catch (IOException | CertificateException | NoSuchAlgorithmException | KeyStoreException | KeyManagementException e) {
    log.error("There was an error while setting up NATS TLS. Ignoring TLS..", e);
}

cf.createConnection();

Now, the connection works and I have thousands of subscriptions and publishers working for days.

But at a certain moment in time, the client disconnects and never reconnects. I’m sorry about the log format, but I’m using a Logstash-like encoder for further log processing. Also, order is from newest to oldest:

NATS connection lost.
{"log":"\t... 5 common frames omitted\n"}
{"log":"\tat sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:847)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:876)\n"}
{"log":"\tat sun.security.ssl.OutputRecord.write(OutputRecord.java:417)\n"}
{"log":"\tat sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)\n"}
{"log":"\tat java.net.SocketOutputStream.write(SocketOutputStream.java:153)\n"}
{"log":"\tat java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)\n"}
{"log":"Caused by: java.net.SocketException: Connection reset\n"}
{"log":"\t... 1 common frames omitted\n"}
{"log":"\tat io.nats.client.ConnectionImpl$8.run(ConnectionImpl.java:1172)\n"}
{"log":"\tat io.nats.client.ConnectionImpl.flusher(ConnectionImpl.java:1720)\n"}
{"log":"\tat java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)\n"}
{"log":"\tat java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)\n"}
{"log":"\tat sun.security.ssl.AppOutputStream.write(AppOutputStream.java:128)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1815)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1870)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1906)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)\n"}
{"log":"\tat sun.security.ssl.Alerts.getSSLException(Alerts.java:208)\n"}
{"log":"Caused by: javax.net.ssl.SSLException: java.net.SocketException: Connection reset\n"}
{"log":"\tat io.nats.client.ConnectionImpl$6.run(ConnectionImpl.java:1141)\n"}
{"log":"\tat io.nats.client.ConnectionImpl$7.run(ConnectionImpl.java:1163)\n"}
{"log":"\tat io.nats.client.ConnectionImpl.readLoop(ConnectionImpl.java:1374)\n"}
{"log":"\tat io.nats.client.ConnectionImpl.processOpError(ConnectionImpl.java:672)\n"}
{"log":"\tat java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)\n"}
{"log":"\tat java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)\n"}
{"log":"\tat sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541)\n"}
{"log":"javax.net.ssl.SSLException: Connection has been shutdown: javax.net.ssl.SSLException: java.net.SocketException: Connection reset\n"}
{"log":"2016-07-14 02:18:37.906 ERROR 6 --- [ readloop] io.nats.client.ConnectionImpl : I/O error during flush\n"}
{"log":"\tat io.nats.client.ConnectionImpl$6.run(ConnectionImpl.java:1141)\n"}
{"log":"\tat io.nats.client.ConnectionImpl$8.run(ConnectionImpl.java:1172)\n"}
{"log":"\tat io.nats.client.ConnectionImpl.flusher(ConnectionImpl.java:1720)\n"}
{"log":"\tat java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)\n"}
{"log":"\tat java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)\n"}
{"log":"\tat sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:847)\n"}
{"log":"\tat sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:876)\n"}
{"log":"\tat sun.security.ssl.OutputRecord.write(OutputRecord.java:417)\n"}
{"log":"\tat sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)\n"}
{"log":"\tat java.net.SocketOutputStream.write(SocketOutputStream.java:153)\n"}
{"log":"\tat java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)\n"}
{"log":"java.net.SocketException: Connection reset\n"}
{"log":"2016-07-14 02:18:37.902 ERROR 6 --- [ flusher] io.nats.client.ConnectionImpl : I/O eception encountered during flush\n"}

On the server side, again from newest to oldest:

[10] 2016/07/14 02:18:37.933466 [DBG] 10.4.1.5:51340 - cid:8939 - Client connection closed
[10] 2016/07/14 02:18:37.933425 [DBG] 10.4.1.5:51340 - cid:8939 - TLS handshake error: remote error: unexpected message
[10] 2016/07/14 02:18:37.921470 [DBG] 10.4.1.5:51340 - cid:8939 - Starting TLS client connection handshake
[10] 2016/07/14 02:18:37.921413 [DBG] 10.4.1.5:51340 - cid:8939 - Client connection created
[10] 2016/07/14 02:18:34.536575 [DBG] 10.4.1.5:52502 - cid:8931 - Client connection closed
[10] 2016/07/14 02:18:34.536499 [DBG] 10.4.0.5:57733 - cid:8937 - Error flushing: write tcp 10.4.5.5:4222->10.4.1.5:52502: i/o timeout

I wonder why TLS handhake fails. It seems flushing is happening before TLS handshake is done.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
pirescommented, Aug 4, 2016

I’m willing to help as well.

1reaction
mcquearycommented, Jul 22, 2016

I do indeed. Looking at code today. If you’re on slack I’d like to chat realtime a bit about this and then post an update here.

On Friday, July 22, 2016, Paulo Pires notifications@github.com wrote:

@mcqueary https://github.com/mcqueary do you agree this is critical? I mean, the workaround is to not use TLS on NATS 👎 .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nats-io/jnats/issues/32#issuecomment-234614744, or mute the thread https://github.com/notifications/unsubscribe-auth/AGq4LDp4e32WL4QzwEK-2UuQzpsg0AYpks5qYQbHgaJpZM4JMPkA .

Larry McQueary | Director, Messaging Technology larry@apcera.com | @mcqueary | github.com/mcqueary

Read more comments on GitHub >

github_iconTop Results From Across the Web

Authentication errors when client doesn't have TLS 1.2 support
You experience authentication and connection errors if the client doesn't support TLS 1.2.
Read more >
Rehash: How to Fix the SSL/TLS Handshake Failed Error
The TLS Handshake Failed error can originate from the client or the server, here's a guide for fixing the problem for both users...
Read more >
Resolve the client SSL/TLS negotiation error when connecting ...
A client TLS negotiation error means that a TLS connection initiated by the client was unable to establish a session with the load...
Read more >
How to Fix the SSL/TLS Handshake Failed Error? - AboutSSL
Some Reasons That Causes SSL/TLS Handshake Failed Error ; Main-in-the-middle, The connection is manipulated or intercepted by a third-party. Client ; Protocol ...
Read more >
How to troubleshoot connection error after disabling TLS 1.0
After blocking TLS 1.0, ICM admin client doesn't connect. Error is: "an error occurred during the login process.(provider : SSL Provider ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found