question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Deltastreamer job does not terminate on Kubernetes when hoodie.metrics.on=true

See original GitHub issue

Describe the problem you faced

We 've noticed that when you enable hoodie JMX metrics Shutdown hook is never called. We took a thread dump to check with threads remain in running state and keep the JVM up .

"RMI TCP Accept-9889": running
	at [java.net.PlainSocketImpl.socketAccept(Native Method)](https://spotify.github.io/threaddump-analyzer/#java.net.PlainSocketImpl.socketAccept(Native%20Method))
	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
	at java.net.ServerSocket.implAccept(ServerSocket.java:560)
	at java.net.ServerSocket.accept(ServerSocket.java:528)
	at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:405)
	at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:377)
	at java.lang.Thread.run(Thread.java:750)

To Reproduce

Steps to reproduce the behavior:

  1. Enable JMX hoodie metrics
    hoodie.metrics.on=true
    hoodie.metrics.reporter.type=JMX
  1. Run a Deltastreamer job on Kubernetes

Expected behavior

We expect when Spark Context is stopped also the JVM should stop.

Environment Description

  • Hudi version : 0.9.0

  • Spark version : 3.1.2

  • Hive version : 2.3.7

  • Hadoop version : N/A

  • Storage (HDFS/S3/GCS…) : S3

  • Running on Docker? (yes/no) : yes

  • Java Version :

openjdk version "1.8.0_322"
OpenJDK Runtime Environment (build 1.8.0_322-b06)
OpenJDK 64-Bit Server VM (build 25.322-b06, mixed mode)

&

openjdk version "11.0.14.1" 2022-02-08
OpenJDK Runtime Environment 18.9 (build 11.0.14.1+1)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.14.1+1, mixed mode, sharing)

Additional context

We tested this with both Java 11 and Java 8 and got the same results . You can find below the full thread dump output once the Spark Context is succefully stoppped . We got it with kill -3 <pid> .

The only relevant issue I could find is this

Full Thread Dump Output
Full thread dump OpenJDK 64-Bit Server VM (25.322-b06 mixed mode):

"DestroyJavaVM" #73 prio=5 os_prio=0 tid=0x00007f8a7c011000 nid=0x1a waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"RMI TCP Connection(idle)" #67 daemon prio=5 os_prio=0 tid=0x00007f8974003800 nid=0x5d waiting on condition [0x00007f897b8f9000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000c10bc948> (a java.util.concurrent.SynchronousQueue$TransferStack)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
    at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
    at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

"RMI RenewClean-[100.97.64.2:9889]" #66 daemon prio=5 os_prio=0 tid=0x00007f896c006800 nid=0x5c in Object.wait() [0x00007f897b9fa000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
    - locked <0x00000000c1064a20> (a java.lang.ref.ReferenceQueue$Lock)
    at sun.rmi.transport.DGCClient$EndpointEntry$RenewCleanThread.run(DGCClient.java:563)
    at java.lang.Thread.run(Thread.java:750)

"RMI Scheduler(0)" #65 daemon prio=5 os_prio=0 tid=0x00007f8a7ddf6000 nid=0x5b waiting on condition [0x00007f897bafb000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000c1064c28> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

"RMI TCP Connection(idle)" #64 daemon prio=5 os_prio=0 tid=0x00007f8974001800 nid=0x5a waiting on condition [0x00007f897bbfc000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000c10bc948> (a java.util.concurrent.SynchronousQueue$TransferStack)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
    at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
    at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

"GC Daemon" #63 daemon prio=2 os_prio=0 tid=0x00007f8a7dde9800 nid=0x59 in Object.wait() [0x00007f897befd000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at sun.misc.GC$Daemon.run(GC.java:117)
    - locked <0x00000000c1067260> (a sun.misc.GC$LatencyLock)

"RMI Reaper" #62 prio=5 os_prio=0 tid=0x00007f8a7dde7800 nid=0x58 in Object.wait() [0x00007f897bffe000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
    - locked <0x00000000c10673e8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
    at sun.rmi.transport.ObjectTable$Reaper.run(ObjectTable.java:351)
    at java.lang.Thread.run(Thread.java:750)

"RMI TCP Accept-9889" #61 daemon prio=5 os_prio=0 tid=0x00007f8a7c275800 nid=0x57 runnable [0x00007f89d81e5000]
   java.lang.Thread.State: RUNNABLE
    at java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
    at java.net.ServerSocket.implAccept(ServerSocket.java:560)
    at java.net.ServerSocket.accept(ServerSocket.java:528)
    at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:405)
    at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:377)
    at java.lang.Thread.run(Thread.java:750)

"java-sdk-http-connection-reaper" #58 daemon prio=5 os_prio=0 tid=0x00007f8a7da85800 nid=0x54 waiting on condition [0x00007f89d8ae8000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at com.amazonaws.http.IdleConnectionReaper.run(IdleConnectionReaper.java:181)

"Timer for 's3a-file-system' metrics system" #57 daemon prio=5 os_prio=0 tid=0x00007f8a7e0fa000 nid=0x53 in Object.wait() [0x00007f89d8fe9000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.util.TimerThread.mainLoop(Timer.java:552)
    - locked <0x00000000d6720a88> (a java.util.TaskQueue)
    at java.util.TimerThread.run(Timer.java:505)

"MutableQuantiles-0" #56 daemon prio=5 os_prio=0 tid=0x00007f8a7e0c2000 nid=0x52 waiting on condition [0x00007f89d90ea000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000d66e87c8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

"org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" #55 daemon prio=5 os_prio=0 tid=0x00007f8a7e0bd000 nid=0x51 in Object.wait() [0x00007f89d91eb000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
    - locked <0x00000000d66e8a08> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
    at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3762)
    at java.lang.Thread.run(Thread.java:750)

"Okio Watchdog" #41 daemon prio=5 os_prio=0 tid=0x00007f8a7d1ea800 nid=0x41 in Object.wait() [0x00007f89dbbfe000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at okio.AsyncTimeout.awaitTimeout(AsyncTimeout.java:347)
    at okio.AsyncTimeout$Watchdog.run(AsyncTimeout.java:312)
    - locked <0x00000000d62d18c8> (a java.lang.Class for okio.AsyncTimeout)

"OkHttp ConnectionPool" #40 daemon prio=5 os_prio=0 tid=0x00007f8a7da40000 nid=0x40 in Object.wait() [0x00007f8a4c69b000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:460)
    at okhttp3.ConnectionPool$1.run(ConnectionPool.java:67)
    - locked <0x00000000d5ccb0e8> (a okhttp3.ConnectionPool)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

"process reaper" #10 daemon prio=10 os_prio=0 tid=0x00007f8a7ec2d800 nid=0x24 waiting on condition [0x00007f8a6c082000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000d5888a60> (a java.util.concurrent.SynchronousQueue$TransferStack)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
    at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
    at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00007f8a7c0c1800 nid=0x21 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f8a7c0bc800 nid=0x20 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f8a7c0ba800 nid=0x1f waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f8a7c0a6800 nid=0x1e waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f8a7c083000 nid=0x1d in Object.wait() [0x00007f8a810e3000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
    - locked <0x00000000d55feb78> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f8a7c07e800 nid=0x1c in Object.wait() [0x00007f8a811e4000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:502)
    at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
    - locked <0x00000000d55fed30> (a java.lang.ref.Reference$Lock)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

"VM Thread" os_prio=0 tid=0x00007f8a7c075000 nid=0x1b runnable

"VM Periodic Task Thread" os_prio=0 tid=0x00007f8a7c0c4800 nid=0x22 waiting on condition

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:3
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
toniniscommented, Apr 27, 2022

@yihua I will try test this today . Thanks

0reactions
nsivabalancommented, Sep 23, 2022

going ahead and closing this for now. feel free to re-open or raise a new issue if you need any more assistance. thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Metrics - Apache Hudi
In this section, we will introduce the MetricsReporter and HoodieMetrics in Hudi. You can view the metrics-related configurations here.
Read more >
Amazon EMR - Noise
Amazon EMR on EKS is a deployment option for EMR that allows organizations to run Apache Spark on Amazon Elastic Kubernetes Service (Amazon...
Read more >
New features from Apache Hudi 0.7.0 and 0.8.0 available on ...
Apache Hudi is integrated with open-source big data analytics frameworks like Apache Spark, Apache Hive, Presto, and Trino. It allows you to ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found