In extreme concurrent situation permittedNumberOfCallsInHalfOpenState isn't working properly
See original GitHub issueResilience4j version: 1.5.0
Java version: 11
Created a circuit breaker with permittedNumberOfCallsInHalfOpenState(1)
but in high load, with timeout issue, HALF_OPEN state creates more than 1 call to the backend.
How to reproduce:
- Codebase: https://github.com/rejuan/Resilience4jPractice
- Create a mock URL with SOAPUI or any other tool -> URL: http://localhost:8089/mock and set response delay 2 second to generate a timeout scenario
- Make a hit with Apache JMeter or any other tool - Number of the thread: 15, Loop Count: infinite
How to observe: In the log, we will find multiple timeout occurrences with a couple of milliseconds difference.
2020-07-21 20:43:28,958 ERROR com.shortandprecise.resilience4jpractice.service.WebClientService [parallel-3] timeout happened
2020-07-21 20:43:28,959 ERROR com.shortandprecise.resilience4jpractice.service.WebClientService [parallel-4] timeout happened
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Resiliency patterns with Spring Boot and Resilience4j - Medium
Resiliency is the ability of application to recover from certain types of failures and remain functional.
Read more >City Library — An advanced guide to Circuit Breakers in Kotlin
We will check how the circuit-breaker status changes and understand ... In order for this to work, we need to activate the right...
Read more >How's the behaviour of circuit breaker in HALF_OPEN state ...
That means if you have 3 concurrent calls in HALF_OPEN state, two are permitted and 1 is rejected. But if 2 calls are...
Read more >Circuit Breaker Pattern With Spring Boot | Vinsguru
Let's consider below architecture in which Service B depends on Service C which has an issue. It is not behaving correctly. With this...
Read more >Spring Cloud Gateway custom filter ... - Programming VIP
How to obtain the status of circuit breaker ... GatewayFilterChain chain) { // Concurrency is not considered here.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hello @RobWin,
I am a research assistant at the university of Stuttgart.
I read the documentation regarding CircuitBreaker provided in https://resilience4j.readme.io/docs/circuitbreaker#
I Formally specified the explained CircuitBreaker behavioral design (sliding window type) in TLA+.
I could verify different invariant and liveness properties regarding the specified design except one, i.e., the number of executions when CircuitBreaker is in the Half state must be at most equal to a pre-defined constant number (permittedNumberOfCallsInHalf). However, the model checker tells me that the system could give permission to a few available threads in the closed state before moving from closed to open state and from open to the half-open state. Therefore, I can never assert that number of executions in the Half state is at most equal to the constant number. In fact, I can assert that number of executions in half state is less than equal permittedNumberOfCallsInHalf + number of available calling threads.
Consider more than 20 threads to obtain permission before the CircuitBreaker switch to Half state. 20 calls + permittedNumberOfCallsInHalf could be a pressure on already down service. Although, one could argue this could be resolved using the bulkhead pattern.
My question is how is this situation handled in the code? Is it at all a design issue related to CircuitBreaker?
Sorry for taking your time, and Merry Christmas in advance. Sincerely, Alireza
Hi @RobWin,
Actually above case isn’t possible because the HTTP call timeout is 1 second on the other hand wait duration in the open state is 10 seconds. So WebClient call is initiated from the HALF_OPEN state.
I have updated the sample code with the state transition log. https://github.com/rejuan/Resilience4jPractice/blob/bf12f056eceaa9c572228b675f72ce6b0441e472/src/main/java/com/shortandprecise/resilience4jpractice/config/ApplicationStartUp.java#L41-L42
Added some log as well. Hopefully, it might help.