question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Connection reset by peer IOException did not fire

See original GitHub issue

Background

I have a service that use Netty as client to collect information from thousands of host continuously. I expect Netty to throw IOException regarding connection reset by peer whenever the connection has been dropped by destination prior to write. While this expected behavior is happening on local test cases, it is not the case in our large scale environment. In my situation after connection being dropped, seems like Netty attempt to write and returned successfully but the channel got close immediately which result in my application to hang there assuming that the request still is in progress. The only way that I could identify the gap was to do tcpdump and sample those request. I simplify one of those request here:

Relative Time Source Destination Protocol Info
0.011689 10.0.0.1 10.0.0.2 TCP [SYC] Seq=0 Len=0
0.012461 10.0.0.2 10.0.0.1 TCP [SYC, ACK] Seq=0 Ack=1 Len=0
0.12480 10.0.0.1 10.0.0.2 TCP [ACK] Seq=1 Ack=1 Len=0
1.659652 10.0.0.2 10.0.0.1 TCP [FIN, ACK] Seq=1 Ack=1 Len=0
1.661622 10.0.0.1 10.0.0.2 TCP [ACK] Seq=1 Ack=2 Len=0
12.484883 10.0.0.1 10.0.0.2 TCP [PSH, ACK] Seq=1 Ack=2 Len=14
12.484921 10.0.0.1 10.0.0.2 TCP [FIN, ACK] Seq=15 Ack=2 Len=0
12.485665 10.0.0.2 10.0.0.1 TCP [RST] Seq=2 Len=0

After this observation, I was able to set a listener to validate my assumption regarding channels getting closed prior to the exception being thrown out.

channel.closeFuture().addListener(new ChannelFutureListener() {
    @Override
    public void operationComplete(ChannelFuture future) throws Exception {
        //do logic to avoid the application to hang for this request. 
    }
});

At this point I was able to validate that channel being closed after write while Netty returned write as successful. So I used this logic in order to develop a small test application to reproduce the issue described above.

Sample Test Case

I implemented a test scenario that reproduce the situation that happened above, but unfortunately I cannot share our actual code due to confidentiality. I am very new to Netty (~2weeks) so apologize in advance if I am completely doing it wrong here.

import io.netty.bootstrap.Bootstrap;
import io.netty.channel.Channel;
import io.netty.channel.ChannelFuture;
import io.netty.channel.ChannelFutureListener;
import io.netty.channel.ChannelHandlerContext;
import io.netty.channel.ChannelInitializer;
import io.netty.channel.SimpleChannelInboundHandler;
import io.netty.channel.nio.NioEventLoopGroup;
import io.netty.channel.socket.nio.NioSocketChannel;
import io.netty.handler.codec.string.StringDecoder;
import io.netty.handler.codec.string.StringEncoder;

import java.io.IOException;
import java.net.InetSocketAddress;
import java.net.ServerSocket;
import java.net.Socket;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutionException;
import java.util.function.Consumer;

public class Test {

    public static void main(String[] args) throws IOException, ExecutionException, InterruptedException {
        Server server = new Server();
        new Thread(server).start();

        Client client = new Client();

        CompletableFuture future = client.process();

        server.countDownLatch.await();
        client.countDownLatch.countDown();

        future.get();
        System.out.println("Done!");
    }

    public static class Client {
        public CountDownLatch countDownLatch = new CountDownLatch(1);

        public CompletableFuture process() {
            NioEventLoopGroup eventLoopGroup = new NioEventLoopGroup();
            CompletableFuture future = new CompletableFuture<>();
            try {

                ChannelFuture connection = new Bootstrap()
                        .group(eventLoopGroup)
                        .channel(NioSocketChannel.class)
                        .handler(new ChannelInitializer<Channel>() {
                            @Override
                            protected void initChannel(Channel channel) throws Exception {
                                channel.pipeline().addLast(new StringDecoder());
                                channel.pipeline().addLast(new StringEncoder());

                                channel.pipeline().addLast(
                                        new SimpleChannelInboundHandler() {

                                            @Override
                                            protected void channelRead0(ChannelHandlerContext ctx, Object o) throws Exception {
                                                /* process response */
                                                future.complete("Done");
                                                ctx.channel().close();
                                            }

                                            @Override
                                            public void exceptionCaught(ChannelHandlerContext ctx, Throwable ex) {
                                                future.completeExceptionally(ex);
                                                ctx.channel().close();
                                            }
                                        }
                                );
                            }
                        }).connect(new InetSocketAddress("localhost", 5555));

                connection.addListener((ChannelFuture channelFuture) -> {
                    if (channelFuture.isSuccess()) {
                        String message = "Message";
                        try {
                            countDownLatch.await(); /* wait to ensure connection drop by client first */
                        } catch (InterruptedException e) {
                            e.printStackTrace();
                        }
                        channelFuture.channel()
                                .writeAndFlush(message)
                                .addListener((ChannelFuture writeFuture) -> {
                                    if (!writeFuture.isSuccess() || writeFuture.cause() != null) {
                                        future.completeExceptionally(new Exception("Failed to write"));
                                    }
                                })
                                 //Close the connection as Netty closed after write.
                                .addListener(ChannelFutureListener.CLOSE);
                    } else {
                        future.completeExceptionally(new Exception("Failed to connect"));
                    }
                });
            } catch (Exception e) {
                future.completeExceptionally(e);
            }

            return future;

        }

    }


    public static class Server implements Runnable {

        private ServerSocket serverSocket;
        public CountDownLatch countDownLatch = new CountDownLatch(1);

        public Server() throws IOException {
            this.serverSocket = new ServerSocket(5555);
        }

        @Override
        public void run() {
            try {
                Socket socket = serverSocket.accept();
                Thread.sleep(500);
                socket.close();
                countDownLatch.countDown();
            } catch (IOException | InterruptedException e) {
                e.printStackTrace();
            }
        }
    }
}

Netty version

Netty 4.1.48.Final

JVM version (e.g. java -version)

openjdk version “11.0.11” 2021-04-20 LTS OpenJDK Runtime Environment (build 11.0.11+9-LTS) OpenJDK 64-Bit Server VM (build 11.0.11+9-LTS, mixed mode)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
pedramsouricommented, May 13, 2021

Great advise @NiteshKant, I would definitely give this a try and update this thread if I learn anything new. Thank you for chiming in.

Maybe we can have separate discussion on making change to the write to not only look at local result as part of the FD and also wait for the server ACK. For me as a user, when I use TCP, I expect the library to follow the protocol completely. I’m sure there would be tradeoffs here, still trying to learn more about Netty.

1reaction
NiteshKantcommented, May 13, 2021

Then isn’t it reasonable to expect java.io.IOException: Connection reset by peer being thrown after write, for the read ?

If the connection is fully closed then eventually read will throw the exception that you mention. Timing of write does not matter for exception in read, just to be clear.

cause in that way I can rely on my channel pipeline to catch that exception and complete the Future.

If you just want to listen for channel closure then you can implement channelInactive() method in your handler or from the channel you can listen to the closeFuture()

Read more comments on GitHub >

github_iconTop Results From Across the Web

How To Fix the Error “Connection Reset by Peer” - Alphr
A “connection reset by peer” error means the TCP stream was closed, for whatever reason, from the other end of the connection.
Read more >
error - Connection reset by peer - IOException (...) - Traccar
I use Unraid and I am running Traccar Docker Version 4.11 and I am using MariaDB Version 10.4.16 in another docker for the...
Read more >
When is "java.io.IOException:Connection reset by peer" thrown?
This log is from a game server implemented using netty. What can cause this exception ? ... Well, the client has rejected/closed the...
Read more >
java.io.IOException: Connection reset by peer #388 - GitHub
I upgraded to the latest Spring Boot 2.0.4 / Reactor-Netty 0.7.8 and haven't seen a single connection reset yet. I did not perform...
Read more >
Error "Caused by: java.io.IOException: Connection reset by peer ...
The server runs, and we can test it using curl, from an external server. This is the request we are sending: curl -v...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found