question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"Lost connection to socket" every 5 seconds

See original GitHub issue

What version of gRPC and what language are you using?

C# 2.53.0

What operating system (Linux, Windows,…) and version?

Issue observed both Windows 11 and Linux

What runtime / compiler are you using (e.g. .NET Core SDK version dotnet --info)

.NET 6.0.408

What did you do/see/expect?

I have a .NET client application talking to a Java gRPC server. My custom resolver was being unexpectedly called every 5 seconds so after enabling trace level logs on grpc-dotnet and adding logging for HTTP I see the following message occurring every 5 seconds is there is no traffic between the socket connecting and the first time SocketConnectivitySubchannelTransport.OnCheckSocketConnection is called:

frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459206138-07:00 trce: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[4]
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459255541-07:00       Subchannel id '1' checking socket 10.240.7.121:5001.
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459264942-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: Socket#43624646, memberName: Poll, message: Poll returns socketCount:0
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459271942-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: Socket#43624646, memberName: Dispose, message: timeout = -1
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459278243-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: Socket#43624646, memberName: Dispose, message: disposing:True Disposed:False
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459289243-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: Socket#43624646, memberName: Dispose, message: Calling _handle.CloseAsIs()
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459330846-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: SafeSocketHandle#46732916, memberName: CloseAsIs, message: shouldClose=True
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459337746-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: SafeSocketHandle#46732916, memberName: ReleaseHandle, message: shouldClose=False
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459343647-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: SafeSocketHandle#46732916, memberName: CloseHandle, message: handle:0x16B
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459349747-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: SafeSocketHandle#46732916, memberName: DoCloseHandle, message: handle:0x16B Following 'non-abortive' branch.
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.459390249-07:00 ############## [EventSource(Private.InternalDiagnostics.System.Net.Sockets, ae391de7-a2cb-557c-dd34-fe00d0b98c7f)] [Info] thisOrContextObject: SafeSocketHandle#46732916, memberName: CloseHandle, message: handle:363, close():SUCCESS
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.460124293-07:00 ############## [EventSource(System.Net.NameResolution, 4b326142-bfb5-5ed3-8585-7714181d14b0)] [ResolutionStart] hostNameOrAddress: fe-e2e-d42b40067b4f4c779181362-az22db33c3-search-svc.gbarberi3
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.460150295-07:00 dbug: Grpc.Net.Client.Balancer.Subchannel[11]
frontend-58846576d8-8z8jd frontend 2023-05-31T12:21:23.460158196-07:00       Subchannel id '1' state changed to Idle. Detail: 'Lost connection to socket.'.

My channel is configured like this:

                var channel = GrpcChannel.ForAddress(
                    address,
                    channelOptions: new GrpcChannelOptions
                    {
                        ThrowOperationCanceledOnCancellation = true,
                        LoggerFactory = services.GetRequiredService<ILoggerFactory>(),
                        Credentials = ChannelCredentials.Insecure,
                        MaxReconnectBackoff = serviceOptions.MaxReconnectBackoff,
                        MaxReceiveMessageSize = serviceOptions.MaxReceiveMessageSize,
                        MaxSendMessageSize = serviceOptions.MaxSendMessageSize,
                        ServiceConfig = new ServiceConfig { LoadBalancingConfigs = { new RoundRobinConfig() } },
                        HttpHandler = new SocketsHttpHandler
                        {
                            ConnectTimeout = serviceOptions.ConnectTimeout, // 30 sec by default
                            EnableMultipleHttp2Connections = true
                        },
                        // Required to dispose of manually created SocketsHttpHandler above
                        DisposeHttpClient = true,
                        // Create a service provider to inject the activation manager into the resolver factory
                        ServiceProvider = new ServiceCollection()
                            .AddSingleton<ResolverFactory, IndexServiceReaderResolverFactory>()
                            .AddSingleton<ResolverFactory, IndexServiceWriterResolverFactory>()
                            .BuildServiceProvider()
                    }
                );

I added a breakpoint to SocketConnectivitySubchannelTransport and see that IsSocketInBadState returns true when the transport is polling for status every 5 seconds which causes the transport to then disconnect.

I also added a breakpoint right after the connection is established in SocketConnectivitySubchannelTransport.TryConnectAsync here:

SocketConnectivitySubchannelTransportLog.ConnectingSocket(_logger, _subchannel.Id, currentAddress);
await _socketConnect(socket, currentAddress.EndPoint, context.CancellationToken).ConfigureAwait(false);
SocketConnectivitySubchannelTransportLog.ConnectedSocket(_logger, _subchannel.Id, currentAddress);

Calling IsSocketInBadState(socket) in the breakpoint above is returning true. The socket is healthy but there is data pending (socket.Available==40):

  {System.Net.Sockets.Socket}
    AddressFamily: InterNetworkV6
    Available: 40
    Blocking: true
    Connected: true
    DontFragment: Exception of type 'System.NotSupportedException' was thrown
    DualMode: true
    EnableBroadcast: Exception of type 'System.Net.Sockets.SocketException' was thrown
    ExclusiveAddressUse: false
    Handle: 0xd38
    IsBound: true
    LingerState: {System.Net.Sockets.LingerOption}
    LocalEndPoint: {[::ffff:127.0.0.1]:64546}
    MulticastLoopback: Exception of type 'System.Net.Sockets.SocketException' was thrown
    NoDelay: true
    ProtocolType: Tcp
    ReceiveBufferSize: 65536
    ReceiveTimeout: 0
    RemoteEndPoint: null
    SafeHandle: {System.Net.Sockets.SafeSocketHandle}
    SendBufferSize: 65536
    SendTimeout: 0
    SocketType: Stream
    Ttl: 128
    UseOnlyOverlappedIO: false

The IsSocketInBadState function will return true if there is data pending according to the comments and socket.Poll docs:

    private static bool IsSocketInBadState(Socket socket)
    {
        try
        {
            // Will return true if closed or there is pending data for some reason.
            return socket.Poll(0, SelectMode.SelectRead);
        }
        catch (Exception e) when (e is SocketException || e is ObjectDisposedException)
        {
            return false;
        }
    }

The 40 bytes in the socket are 00001204000000000000037FFFFFFF000400100000000600002000000004080000000000000F0001 which I don’t know what they mean but searching for those bytes I see references to java gRPC service.

Issue Analytics

  • State:open
  • Created 4 months ago
  • Reactions:3
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
Wolffy101commented, Jul 19, 2023

Connecting to Java grpc immediately returns 40 bytes [0,0,18,4,0,0,0,0,0,0,3,127,255,255,255,0,4,0,16,0,0,0,6,0,0,32,0,0,0,4,8,0,0,0,0,0,0,15,0,1] image image connect to C-core-based gRPC C# Server return 46 bytes Server code image

1reaction
JamesNKcommented, Jul 20, 2023

I tested the 46 byte result from Grpc.Core is, and it’s an HTTP/2 SETTINGS frame.

I think the PR at https://github.com/grpc/grpc-dotnet/pull/2179 is on the right track, but it can be improved to cache the initial frame and replay it to the client once it needs a connection.

Read more comments on GitHub >

github_iconTop Results From Across the Web

javascript - Handling connection loss with websockets
The connection to the websocket server occurs within a Firefox addon, ... after a connection is lost and keep trying after every 5...
Read more >
Troubleshooting connection issues
So if you are experiencing a regular disconnection after 30 seconds (which was the sum of the values of pingTimeout and pingInterval in...
Read more >
When TCP sockets refuse to die
Our connection settings: TCP_USER_TIMEOUT = 5*1000 - 5 seconds; SO_KEEPALIVE = 1 - enable keepalives; TCP_KEEPIDLE = 1 - send first probe ...
Read more >
Hub access times out with: Error Connection lost.
Enable TCP WebSocket Keep-Alive and HTTP keep-alive timeout. Check to see if this issue occurs when using Internet Explorer. Internet Explorer ...
Read more >
how to fix internal exception java.net.socketexception ...
Try restarting your router, then your computer. If on wifi, delete the wifi connection and reconnect.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found