Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bidi calls never close, leaking memory client and server

See original GitHub issue

Using v1.8.0.

The following client/server leaves the bidi RPC call open, holding onto resources and ultimately going OOM if enough calls are made:

// proto
message Request {}
message Response {}

service TestService {
    rpc Ping(stream Request) returns (stream Response);
}

/////////////////////////////
// server impl
private static class ServerServiceImpl extends TestServiceGrpc.TestServiceImplBase {
    @Override
    public StreamObserver<TestRpc.Request> ping(StreamObserver<example.TestRpc.Response> responseObserver) {
        return new StreamObserver<TestRpc.Request>() {
            @Override
            public void onNext(TestRpc.Request request) {
                responseObserver.onNext(TestRpc.Response.newBuilder().build());
                responseObserver.onCompleted();
            }

            @Override public void onError(Throwable throwable) {}
            @Override public void onCompleted() {}
        };
    }
}
    
/////////////////////////////
// client impl
private void oneCall(Channel chan) {
    ClientCall<TestRpc.Request, TestRpc.Response> call =
            chan.newCall(TestServiceGrpc.getPingMethod(), CallOptions.DEFAULT);
    call.start(new ClientCall.Listener<TestRpc.Response>(), new Metadata());
    call.sendMessage(TestRpc.Request.newBuilder().build());
    call.request(1);
}
for (int i = 0; i < 1000; ++i) {
    oneCall(channel);
}

// If I attach a debugger to the client here, I can see 1000 instances of DefaultStream and 1000 instances of a bunch of other grpc/netty bits, even after > 5 minutes and repeated attempts at GC.
Thread.sleep(9999999);

Replacing the client’s ClientCall.Listener with one that calls .halfClose() upon completion works around the issue.

Issue Analytics

State:
Created 6 years ago
Comments:9 (5 by maintainers)

Top GitHub Comments

1reaction

ejona86commented, Mar 9, 2018

In HTTP (and gRPC) the bi-di stream is done once the server finishes responding. We should simply throw away anything the client tries to send past that point (which is core to our design).

I would fully believe okhttp has the same bug. In fact, I’d expect okhttp to have it before netty.

0reactions

ericgribkoffcommented, Mar 15, 2018

https://github.com/grpc/grpc-java/pull/4222 contains a fix for this problem: it changes the Netty client to send a reset stream when it receives an end-of-stream from the server without having already half-closed. There are more details in the comments on #4222, but sending the reset stream frees the stream resources on the client, and receipt of the reset stream frees the stream resources on the server.

Since we can’t assume all clients will be updated to send the reset stream, I’ll need to send out another PR to let the server releases stream resources even without this behavior change in the client. But just the client updates in #4222 alone are enough to solve this problem, as least as far as I’ve been able to reproduce and test it.

Top Results From Across the Web

grpc/grpc - Gitter

Hi! I would like to know what is the best practice for error handling for streaming messages (bidi, client-streaming, server-streaming). I mean for...

Memory leak in Java client bindings when server request ...

I did a load of testing, and had it down to calls to .set , .get[Range] and .clear[Range] within a transaction. If I...

Finding and fixing memory leaks in Go - DEV Community ‍ ‍

But, what happens if you don't Close the client when you should? You get a memory leak. The underlying connections never get cleaned...

How to analyze potential memory leak in Grpc-Java

There is a lot "io.grpc.StatusRuntimeException: CANCELLED: client cancelled" error in server side. And occationally, I will get "io.netty.util.

Undefined behavior or memory leak when using placement-new

There is a sentence in the standard which is not very clear about its meaning in [basic.life]/5 saying that if a destructor call...