Ocasional NullPointerException in io.grpc.internal.RetriableStream.drain(RetriableStream.java:279) when using hedgingPolicy
See original GitHub issueWhat version of gRPC-Java are you using?
Version 1.42.2
What is your environment?
Linux
What did you expect to see?
I’m using the hedging retry policy via configuration and ocasionally see a NullPointerException pop up.
Here’s a snippet of the Kotlin code that configures the hedging policy with an 85ms hedging delay:
.defaultServiceConfig(
mapOf(
"loadBalancingPolicy" to "round_robin",
"methodConfig" to listOf(
mapOf(
"name" to listOf(
mapOf(
"service" to "my.org.Service",
"method" to "MyMethod"
)
),
"waitForReady" to true,
"hedgingPolicy" to mapOf(
"maxAttempts" to 2.
"hedgingDelay" to "0.085s",
"nonFatalStatusCodes" to listOf(Status.UNAVAILABLE.code.name)
)
)
)
)
)
What did you see instead?
java.lang.NullPointerException: null
at io.grpc.internal.RetriableStream.drain(RetriableStream.java:279)
at io.grpc.internal.RetriableStream.access$1100(RetriableStream.java:55)
at io.grpc.internal.RetriableStream$HedgingRunnable$1.run(RetriableStream.java:476)
at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:79)
at io.micrometer.core.instrument.Timer.lambda$wrap$0(Timer.java:160)
at io.micronaut.scheduling.instrument.InvocationInstrumenterWrappedRunnable.run(InvocationInstrumenterWrappedRunnable.java:47)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.lang.Thread.run(Thread.java:834)
Steps to reproduce the bug
Enable hedging policy above
Issue Analytics
- State:
- Created a year ago
- Comments:9 (6 by maintainers)
Top Results From Across the Web
core/src/main/java/io/grpc/internal/RetriableStream.java
abstract class RetriableStream<ReqT> implements ClientStream {. @VisibleForTesting. static final Metadata.Key<String> GRPC_PREVIOUS_RPC_ATTEMPTS = Metadata.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I don’t think those guarded by things are that important. All those locks are the same instance, just aliases. So it is more just that the code is written in a way that isn’t able to be verified by tooling.
Since writing a message has to be done after start(), it feel like that buffer leak should be unrelated. But that stack trace is in the same hedging draining code, which does make it suspiciously related.