question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Infinite loop due to thread safety of MpscUnboundedArrayQueue.poll

See original GitHub issue

As its name (Multi Producer Single Consumer) indicates, MpscUnboundedArrayQueue is not thread-safe when polling. If multiple threads are polling from it, those threads may end up in infinite loop here:

                do
                {
                    e = lvElement(buffer, offset);
                }
                while (e == null);

UnboundedProcessor is internally using MpscUnboundedArrayQueue. When a connection needs to be terminated (due to no keep-alive acks or other reasons), more than 1 threads will try to poll from its MpscUnboundedArrayQueue.

As shown below,

  • “reactor-tcp-nio-7” is the “normal” working thread polling from the queue.
  • “parallel-7” is the thread responsible for terminating a connection due to “no keep-alive acks”. It tries to poll all elements from the queue in order to release their memory.
public void clear() {
    while (!queue.isEmpty()) {
      T t = queue.poll();
      if (t != null) {
        release(t);
      }
    }
  }
"reactor-tcp-nio-7@16412" daemon prio=5 tid=0x250 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
      at io.rsocket.internal.jctools.queues.BaseMpscLinkedArrayQueue.poll(BaseMpscLinkedArrayQueue.java:256)
      at io.rsocket.internal.UnboundedProcessor.poll(UnboundedProcessor.java:330)
      at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.poll(FluxMapFuseable.java:174)
      at reactor.netty.channel.MonoSendMany$SendManyInner.run(MonoSendMany.java:264)
      at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
      at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
      at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
      at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
      at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
      at java.lang.Thread.run(Thread.java:745)

"parallel-7@16290" daemon prio=5 tid=0x258 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
      at io.rsocket.internal.jctools.queues.BaseMpscLinkedArrayQueue.poll(BaseMpscLinkedArrayQueue.java:256)
      at io.rsocket.internal.UnboundedProcessor.clear(UnboundedProcessor.java:346)
      at io.rsocket.internal.UnboundedProcessor.cancel(UnboundedProcessor.java:317)
      at io.rsocket.internal.UnboundedProcessor.dispose(UnboundedProcessor.java:364)
      at io.rsocket.RSocketRequester.terminate(RSocketRequester.java:594)
      at io.rsocket.RSocketRequester.tryTerminate(RSocketRequester.java:559)
      at io.rsocket.RSocketRequester.tryTerminateOnKeepAlive(RSocketRequester.java:541)
      at io.rsocket.RSocketRequester$$Lambda$904.1238285436.accept(Unknown Source:-1)
      at io.rsocket.keepalive.KeepAliveSupport.tryTimeout(KeepAliveSupport.java:110)
      at io.rsocket.keepalive.KeepAliveSupport$ClientKeepAliveSupport.onIntervalTick(KeepAliveSupport.java:146)
      at io.rsocket.keepalive.KeepAliveSupport.lambda$start$0(KeepAliveSupport.java:54)
      at io.rsocket.keepalive.KeepAliveSupport$$Lambda$906.2130835366.accept(Unknown Source:-1)
      at reactor.core.publisher.LambdaSubscriber.onNext(LambdaSubscriber.java:160)
      at reactor.core.publisher.FluxInterval$IntervalRunnable.run(FluxInterval.java:123)
      at reactor.core.scheduler.PeriodicWorkerTask.call(PeriodicWorkerTask.java:59)
      at reactor.core.scheduler.PeriodicWorkerTask.run(PeriodicWorkerTask.java:73)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

Expected Behavior

Thread-safe. No infinite loop.

Actual Behavior

Threads sometimes end up in infinite loop when terminating a connection. It has happened 3 times in production for me.

Steps to Reproduce

Demo of infinite loop when multiple threads try to poll from MpscUnboundedArrayQueue.

import io.rsocket.internal.jctools.queues.MpscUnboundedArrayQueue
import org.junit.Test
import reactor.util.concurrent.Queues

class MpscUnboundedArrayQueueTest {
    @Test
    fun test() {
        val a = MpscUnboundedArrayQueue<String>(Queues.SMALL_BUFFER_SIZE)

        for(x in 0 until 4) {
            val t = Thread(
                Runnable {
                    while (true) {
                        a.offer(java.time.LocalTime.now().toString())
                        Thread.sleep(10)
                    }
                }, "p$x"
            )
            t.isDaemon = true
            t.start()
        }
        for(x in 0 until 4) {
            Thread(
                Runnable {
                    while (true) {
                        println(java.time.LocalTime.now().toString() + ": " + Thread.currentThread().toString() + ": " + a.poll())
                    }
                }, "c$x"
            ).start()
        }
        while (true) {
            Thread.sleep(10000)
        }
    }
}

Possible Solution

Replace MpscUnboundedArrayQueue with MPMC (Multi Producer Multi Consumer) queue.

Your Environment

  • RSocket version(s) used: 1.0.0-RC6
  • Other relevant libraries versions (eg. netty, …):
  • Platform (eg. JVM version (javar -version) or Node version (node --version)): Java™ SE Runtime Environment (build 1.8.0_101-b13)
  • OS and version (eg uname -a):Darwin MacBook-Pro.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_64

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

3reactions
xiazuojiecommented, Oct 23, 2020

I mean we still use a single consumer queue, but in the latest implementation, it should not be possible to run in that case when you dispose an UnboundedProcessor and poll elements at the same time which is the successor to MpScUnboundedQueue.

Let me give it a check. I’ll report back later

0reactions
OlegDokukacommented, Feb 26, 2021

@xiazuojie should be fixed in 1.0.4! Let me know if it works for you as well

Read more comments on GitHub >

github_iconTop Results From Across the Web

MpscUnboundedArrayQueue.poll Infinite Loop - Google Groups
MpscUnboundedArrayQueue.poll Infinite Loop. 48 views. Skip to first unread message ... now all grpc-default-worker-ELG threads happens below stack:.
Read more >
Infinite loop problem with while loop and threading [duplicate]
Explanation: if a variable is read and/or written by multiple threads, then you need to take appropriate thread-safety measures.
Read more >
java infinite loop thread - DATA NEXT
In computer programming, an infinite loop (or endless loop) is a sequence of instructions that, ... MpscUnboundedArrayQueue is not thread-safe when polling.
Read more >
Threading in C++17: Loopers & Dispatchers - C++ Stories
Loopers, in its core, are objects, which contain or are attached to a thread with a conditional infinite loop, which runs as long...
Read more >
Multi-Threaded Messaging with the Apache Kafka Consumer
Basically, it's an infinite loop that repeats two actions: ... In each iteration of the poll loop, the main thread checks which tasks...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found