question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Server dies on keep-alive ack timeout

See original GitHub issue

The rsocket server dies on keep-alive ack timeout. I’ve tried adding onErrorResume, but to no avail. How can I prevent the server from closing its socket on error?

Error

[2019-05-20 11:12:21.438] ERROR [parallel-1] RegistryRSocketServer: Error occurred during session
io.rsocket.exceptions.ConnectionErrorException: No keep-alive acks for 60000 ms
at io.rsocket.keepalive.KeepAliveConnection.lambda$startKeepAlives$1(KeepAliveConnection.java:97)
at reactor.core.publisher.LambdaMonoSubscriber.onNext(LambdaMonoSubscriber.java:137)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1476)
at reactor.core.publisher.MonoProcessor.onNext(MonoProcessor.java:389)
at io.rsocket.keepalive.KeepAliveHandler.doCheckTimeout(KeepAliveHandler.java:112)
at io.rsocket.keepalive.KeepAliveHandler$Server.onIntervalTick(KeepAliveHandler.java:128)
at io.rsocket.keepalive.KeepAliveHandler.lambda$start$0(KeepAliveHandler.java:63)
at reactor.core.publisher.LambdaSubscriber.onNext(LambdaSubscriber.java:130)
at reactor.core.publisher.FluxInterval$IntervalRunnable.run(FluxInterval.java:123)
at reactor.core.scheduler.PeriodicWorkerTask.call(PeriodicWorkerTask.java:59)
at reactor.core.scheduler.PeriodicWorkerTask.run(PeriodicWorkerTask.java:73)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)

io.rsocket.exceptions.ConnectionErrorException: No keep-alive acks for 60000 ms
at io.rsocket.keepalive.KeepAliveConnection.lambda$startKeepAlives$1(KeepAliveConnection.java:97)at reactor.core.publisher.LambdaMonoSubscriber.onNext(LambdaMonoSubscriber.java:137)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1476)20 May 2019  at reactor.core.publisher.MonoProcessor.onNext(MonoProcessor.java:389)
at io.rsocket.keepalive.KeepAliveHandler.doCheckTimeout(KeepAliveHandler.java:112)
at io.rsocket.keepalive.KeepAliveHandler$Server.onIntervalTick(KeepAliveHandler.java:128)
at io.rsocket.keepalive.KeepAliveHandler.lambda$start$0(KeepAliveHandler.java:63)
at reactor.core.publisher.LambdaSubscriber.onNext(LambdaSubscriber.java:130)
at reactor.core.publisher.FluxInterval$IntervalRunnable.run(FluxInterval.java:123)
at reactor.core.scheduler.PeriodicWorkerTask.call(PeriodicWorkerTask.java:59)
at reactor.core.scheduler.PeriodicWorkerTask.run(PeriodicWorkerTask.java:73)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)

Version: 0.12.2-RC2

Server Code

server = RSocketFactory
        .receive()
        .frameDecoder(ZERO_COPY)
        .addConnectionPlugin(micrometerDuplexConnectionInterceptor)
        .errorConsumer(e -> log.error("Error occurred during session", e))
        .acceptor(socketAcceptor)
        .transport(serverTransport)
        .start()
        .onErrorResume(e -> Mono.empty())
        .subscribe();

Acceptor
@Override
  public Mono<RSocket> accept(ConnectionSetupPayload connectionSetupPayload, RSocket rSocket) {
    return Mono.just(new RegistryRSocket(scheduler));
  }

requestStream
return Mono.just(payload)
        .map(this::getRequestFromPayload)
        .flux()
       .map(/*..Does Something..*/)
        .onErrorResume(throwable ->
            Flux.just(createPayloadFromThrowable(throwable)));

private Payload createPayloadFromThrowable(Throwable t) {
    return ByteBufPayload.create(ErrorFrameFlyweight.encode(DEFAULT, 0, t));
  }

Any help would be greatly appreciated

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:27 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
lksvenoy-r7commented, May 21, 2019

@mostroverkhov It seems to be much, much more resilient now. I actually managed to run out of ENIs in amazon, so haven’t been able to do all the testing I’ve wanted so far. I’ll report back on the issue tomorrow and let you know if it resolves it or not, it seems like it does. Thank you for your help.

0reactions
lksvenoy-r7commented, May 22, 2019

I’ve upgraded the instance I was testing on, and it seems remarkably stable. I am fetching gigabytes of data over the socket with no problems, with the exception of heavy load causing keep-alives to not go through, killing the connection occasionally. (This is probably my own fault) I am closing this ticket as the most recent snapshot fixes my problems. Am very much looking forward to it being released as I can’t live without it!

Read more comments on GitHub >

github_iconTop Results From Across the Web

TCP Keepalive Best Practices - detecting network drops and ...
Send TCP Keepalives successfully (within 15 minutes), before idle socket timeout (typically 60 or 30 minutes). Make sure TCP Keepalives retry at ...
Read more >
When TCP sockets refuse to die - The Cloudflare Blog
After a total of three sent probes, and a further three seconds of delay, the connection dies with ETIMEDOUT, and final the RST...
Read more >
2. TCP keepalive overview
Keepalive can be used to advise you when your peer dies before it is able to notify you. This could happen for several...
Read more >
TCP Keepalive and firewall killing idle sessions - Server Fault
To our surprise, the idle but alive connections get killed after about 40 minutes as before. Wireshark running on the client side shows...
Read more >
server still working,client received “keepalive ping failed to ...
If possible, provide a recipe for reproducing the error. client setting ClientParameters{ Time: 10, Timeout: 20, PermitWithoutStream: true, }
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found