question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hopeless SSL failures not reported to client implementation

See original GitHub issue

Hi folks, I’m writing an application that relies on a long-lived bi-directional stream of messages, with the client-server channel running over mTLS, and hitting a bit of a problem.

On startup the client begins a bidirectionally streaming RPC to the server, constructing a stream and sending a “register” message over it. The server does not send a “register acknowledge” message, and I’m unfortunately not in a position to modify the protocol here, only my implementation. This bidirectionally streaming RPC is kept running for the lifetime of the application (until either client or server crashes, which ideally happens rarely).

The certificates used on the client side expire and are re-issued frequently (as fast as every 5 minutes if a customer running this application decides that’s required), and we’ve written our client to re-build the underlying channel the RPC is being made over should the RPC end (e.g. if the server crashes and the clients StreamObserver::onError is called).

We’re finding that, if the following occurs, our client is getting hung forever attempting to connect to the server with a set of certificates the server will never accept:

  • Server crashes, client is disconnected and re-builds channel, taking in newly-issued set of certificates (cert-set A)
  • Server remains offline while another new set of certificates is issued (cert-set B, cert-set A now expired)
  • Server comes online, beings processing connection requests
  • Client now in a state where it believes the initial register message has been processed, but actually the underlying connection is faulty and the channel is stuck in a retry loop forever.

Obviously using something like a deadline isn’t an option here, due to the protocol design.

I suspect the correct approach is to rely on something like a DelegatingSslContext and call ``DelegatingSslContext::updateon detected certificate re-issue, however, it would be great if you happen to know a way for this particular failure to be detectable purely through theStreamObserver` interface on the client side?

It seems like ideally it should be possible to detect this kind of “hopeless” situation (at a base level, the client certificate is expired, so the noAfter will evaluate to a time in the past, which should be enough to be able to say the handshake will never succeed), but I understand it’s tricky - perhaps an SSLException or IOException passed to onError would be appropriate, but I’m not sure.

I’ve included a reproducing case below, with the caveat that rather than creating a client certificate and having it expire, the reproducing case simply has the server require client certificates and then has the client not send any - a similarly “hopeless” case, but without any tricky timing shenanigans. To reproduce:

  • Run “genSecurityContext.sh” to generate a certificate authority and a server cert/key pair signed by that authority
  • Modify the constant “SEC_MATERIAL” to point to wherever you ran “genSecurityContext.sh”
  • Run the application via Main::main(), and note the client is never notified of the permanently broken netty channel

The case is packaged as a maven project, for the sake of convenience, but if you’re (understandably) leery about unzipping random files, the bulk of the logic is:

package com.ericsson.test;

import io.grpc.ManagedChannel;
import io.grpc.Server;
import io.grpc.netty.shaded.io.grpc.netty.GrpcSslContexts;
import io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder;
import io.grpc.netty.shaded.io.grpc.netty.NettyServerBuilder;
import io.grpc.netty.shaded.io.netty.handler.ssl.ClientAuth;
import io.grpc.netty.shaded.io.netty.handler.ssl.SslContext;
import io.grpc.netty.shaded.io.netty.handler.ssl.SslContextBuilder;
import io.grpc.stub.StreamObserver;

import javax.net.ssl.SSLException;
import java.io.File;
import java.io.IOException;
import java.util.concurrent.TimeUnit;

public class Main {
    private static final String SEC_MATERIAL =
            "C:\\Users\\ebrooli\\Documents\\projects\\ongoing\\GSSUPP-7063\\reproduce_grpc_failure\\src\\main\\resources"; // CHANGEME
    private static final String SERVER_CERT = SEC_MATERIAL + "\\server.pem";
    private static final String SERVER_KEY = SEC_MATERIAL + "\\server.key";
    private static final String CA = SEC_MATERIAL + "\\ca.pem";

    static {
        System.setProperty("java.util.logging.config.file",
                "C:\\Users\\ebrooli\\Documents\\projects\\ongoing\\GSSUPP-7063\\reproduce_grpc_failure\\src\\main\\resources\\logging.properties"); // CHANGEME
    }

    public static void main(String[] args) throws IOException, InterruptedException {
        final var channel = getClientChannel();
        final var clientReceiveStream = new PrintObserver();
        final var clientSendStream = TestGrpc.newStub(channel).withWaitForReady().exchangeStream(clientReceiveStream);
        System.out.println("Sending first message");
        clientSendStream.onNext(Message.newBuilder().setPayload("test").build());
        System.out.println("First message sent");
        final var server = buildServer();
        System.out.println("Server built");
        clientSendStream.onNext(Message.newBuilder().setPayload("test2").build());
        System.out.println("Second message sent");
        server.awaitTermination();
    }

    private static Server buildServer() throws IOException {
        final var service = new TestImpl();
        return NettyServerBuilder.forPort(3000)
                .addService(service)
                .keepAliveTime(4, TimeUnit.SECONDS)
                .keepAliveTimeout(1, TimeUnit.SECONDS)
                .permitKeepAliveTime(10, TimeUnit.SECONDS)
                .permitKeepAliveWithoutCalls(true)
                .sslContext(getSslContext()).build().start();
    }

    private static ManagedChannel getClientChannel() throws SSLException {
        return NettyChannelBuilder
                .forAddress("localhost", 3000)
                .keepAliveTime(2, TimeUnit.MINUTES)
                .keepAliveTimeout(10, TimeUnit.SECONDS)
                .keepAliveWithoutCalls(true)
                .disableRetry()
                .sslContext(getClientContext()).build();
    }

    private static SslContext getSslContext() throws SSLException {
        final SslContextBuilder builder = GrpcSslContexts.forServer(new File(SERVER_CERT), new File(SERVER_KEY));
        // Require the client to use mTLS, then when we don't use mTLS on the client side it looks like a handshake failure
        builder.clientAuth(ClientAuth.REQUIRE);
        return builder.build();
    }

    private static SslContext getClientContext() throws SSLException {
        // We're going to setup the client for failure here by not providing a client cert
        return GrpcSslContexts.forClient().trustManager(new File(CA)).build();
    }

    private static class TestImpl extends TestGrpc.TestImplBase {
        @Override
        public StreamObserver<Message> exchangeStream(StreamObserver<Message> clientStream) {
            return new PrintObserver();
        }
    }

    private static class PrintObserver implements StreamObserver<Message> {

        @Override
        public void onNext(Message message) {
            System.out.println(message);
        }

        @Override
        public void onError(Throwable throwable) {
            System.out.println("onError called");
            throwable.printStackTrace();
        }

        @Override
        public void onCompleted() {
            System.out.println("onCompleted");
        }
    }
}

syntax = "proto3";

option java_multiple_files = true;
option java_package = "com.ericsson.test";
option java_outer_classname = "TestService";

package com.ericsson.test;

service Test {
    rpc exchangeStream(stream Message) returns (stream Message) {}
}

message Message {
    string payload = 1;
}

Thanks, Oliver

hopeless_failures.zip

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
ejona86commented, Dec 2, 2021

For now I think internally we’ll be sticking with polling the security material on disk to watch for re-issue and rebuild the underlying SSL context when appropriate

Feel free to use AdvancedTlsX509KeyManager and AdvancedTlsX509TrustManager. Those can do the polling and swap-out for you.

FYI, you can also use TlsChannelCredentials and TlsServerCredentials these days. They are stable APIs (unlike the Netty-based APIs). gRFC L74 has details about using channel credential API.

0reactions
oliverb123commented, Dec 2, 2021

Fair point on the external server’s clock being in the past, you’re right that almost all “permanent” failures are extremely difficult to identify as such given the high dependence on some other system.

For now I think internally we’ll be sticking with polling the security material on disk to watch for re-issue and rebuild the underlying SSL context when appropriate, manually implementing a sensible retry back off mechanism is probably not worth the CPU cycles spent doing the modified timestamp check on 3 files at some reasonably low frequency (for us, for now).

Thanks for the explanation, I’m going to close this issue here, take it easy.

Read more comments on GitHub >

github_iconTop Results From Across the Web

SSL in FileMaker 16 - hopeless? - Claris Community
When we recently installed a cerificate on our FMS16 implementation, we did not bother with using the FileMaker server method, but instead used...
Read more >
Resolving Secure Sockets Layer errors - IBM
Secure Sockets Layer (SSL) errors can be attributed to an incorrect environment setup, a bad server certificate, connection problems, out-of-sync conditions ...
Read more >
SSL Handshake Failures - Baeldung
A focused tutorial on SSL handshake failures and how to fix them. ... User management is very complex, when implemented properly. No ......
Read more >
SSL Handshake Failures - Bad Client Certificate | Apigee Edge
Common Diagnosis Steps · Does not have any Client Certificate in its KeyStore, or; · It is unable to send a Client Certificate....
Read more >
MIT was we will home can us about if page my has no
the of and to a in for is on s that by this with i you it not or be are from ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found