FluxSwitchMap infinite while loop causes high CPU usage
See original GitHub issueExpected Behavior
Loop should not stuck in an infinite loop and should not cause high cpu usage.
Actual Behavior
We realised this problem on production and our CPUs were randomly spiking.
After some investigation with heap dumps, we have found the problem is on FluxSwitchMap#drain
or FluxSwitchMap line 353
, where it does:
Object second;
while((second = q.poll()) == null) {
}
This causes a high cpu and we have to restart our application to stop this (otherwise its never recovered for days).
Screenshot above shows one of our pods in production, where it spikes to that CPU level for days until we notice and restart the application.
Screenshot above shows one of the pods has doubled the CPU spike. This means both parallel threads are in deadlock.
Steps to Reproduce
This is not our source code but I tried to repeat it in a junit4 test. I’m not the best in repeater codes so please run it more than once if it succeeds. I know that its a race condition and it happens randomly. That’s why I repeat the StepVerifier for many time.
While running the test, at some point it stops printing the numbers and CPU spikes. That means the thread is stuck on that while loop.
@Test(timeout = 120_000)
public void test() {
Flux<Integer> integerFlux = Flux.range(0, 20)
.delayElements(Duration.ofMillis(2))
.switchMap(s -> Flux.range(0, 20).delayElements(Duration.ofMillis(1)))
.doOnNext(a -> System.out.println(a));
for (int i = 0; i < 1000; i++) {
StepVerifier.create(integerFlux)
.thenConsumeWhile(a -> true)
.verifyComplete();
}
}
Possible Solution
Your Environment
- Spring Boot 2.2.6
- Spring Boot 2.4.1 (both versions fail)
- reactor-core-3.3.4.RELEASE
- openjdk8
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:7 (3 by maintainers)
Thanks, @TunaYagci! So far I know where is the root cause of that issue, thus, stay tuned, I will keep you updated!
huge thanks to @OlegDokuka and @simonbasle for your efforts! I will test it when you release and update this issue with my findings.