GoogleCloudStorageFileSystem.create may block indefinitely (potential deadlock in BatchHelper)
See original GitHub issueWe run hourly jobs in dataproc which commit output by moving files from a staging directory in hdfs (mounted on local SSDs) to a bucket in gcs. On occasion (roughly once a week), we observe a job that appears to be “stuck” indefinitely while committing output; this process takes ~1.5 minutes on average, but during these anomalous runs, it can take hours; two days ago it took over 9 hours.
Finally, I was able to capture the stacktrace by logging into the instance ~20 minutes after it got “stuck” committing (i.e., moving files). On close examination this appears to be a deadlock caused by the (now) daemon thread raised in this issue: https://github.com/GoogleCloudPlatform/bigdata-interop/issues/150.
Below is the full stacktrace. In main
thread, commitOutput
triggers the process described above. Several threads are dispatched to perform concurrent renames. In pool-26-thread-8
, we can see GoogleCloudStorageFileSystem.create
on the stack as a result of FileUtil.copy
; it’s awaiting completion of the batch as a result of BatchHelper.awaitRequestsCompletion
which in turn appears to be deadlocked owing to BatchHelper.execute
in gcsfs-batch-helper-3912
(daemon) thread.
Full thread dump OpenJDK 64-Bit Server VM (25.181-b13 mixed mode):
"gcsfs-batch-helper-3912" #4130 daemon prio=5 os_prio=0 tid=0x00007f92a4040800 nid=0x3130 runnable [0x00007f9330815000]
java.lang.Thread.State: RUNNABLE
at org.conscrypt.NativeCrypto.SSL_do_handshake(Native Method)
at org.conscrypt.NativeSsl.doHandshake(NativeSsl.java:392)
at org.conscrypt.ConscryptFileDescriptorSocket.startHandshake(ConscryptFileDescriptorSocket.java:225)
at org.conscrypt.ConscryptFileDescriptorSocket.waitForHandshake(ConscryptFileDescriptorSocket.java:474)
at org.conscrypt.ConscryptFileDescriptorSocket.getOutputStream(ConscryptFileDescriptorSocket.java:461)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:465)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
- locked <0x0000000651a1d8a8> (a sun.net.www.protocol.https.HttpsClient)
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:162)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:104)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.execute(BatchHelper.java:175)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.lambda$queue$0(BatchHelper.java:163)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper$$Lambda$96/631118845.call(Unknown Source)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"org.apache.hadoop.hdfs.PeerCache@65577e18" #590 daemon prio=5 os_prio=0 tid=0x00007f91fc007800 nid=0x2335 waiting on condition [0x00007f91bcefa000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:253)
at org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:46)
at org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:124)
at java.lang.Thread.run(Thread.java:748)
"pool-26-thread-16" #268 prio=5 os_prio=0 tid=0x00007f937353d000 nid=0x21ec waiting on condition [0x00007f9322efc000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-26-thread-15" #267 prio=5 os_prio=0 tid=0x00007f937353c000 nid=0x21eb waiting on condition [0x00007f9322cfa000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-26-thread-14" #266 prio=5 os_prio=0 tid=0x00007f937353b000 nid=0x21ea waiting on condition [0x00007f9322bf9000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-26-thread-13" #265 prio=5 os_prio=0 tid=0x00007f937353a000 nid=0x21e9 waiting on condition [0x00007f93286ec000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-26-thread-12" #264 prio=5 os_prio=0 tid=0x00007f9373539000 nid=0x21e8 waiting on condition [0x00007f9332931000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-26-thread-11" #263 prio=5 os_prio=0 tid=0x00007f9373538000 nid=0x21e7 waiting on condition [0x00007f933222a000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-26-thread-10" #262 prio=5 os_prio=0 tid=0x00007f9373536800 nid=0x21e6 waiting on condition [0x00007f9332129000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-26-thread-9" #261 prio=5 os_prio=0 tid=0x00007f9373535800 nid=0x21e5 waiting on condition [0x00007f933262e000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-26-thread-8" #260 prio=5 os_prio=0 tid=0x00007f9373534800 nid=0x21e4 waiting on condition [0x00007f9331d24000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000651a1dbd8> (a java.util.concurrent.FutureTask)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.awaitRequestsCompletion(BatchHelper.java:263)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.flushIfPossible(BatchHelper.java:204)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.flush(BatchHelper.java:235)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfos(GoogleCloudStorageImpl.java:1723)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfos(GoogleCloudStorageFileSystem.java:1211)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.mkdirs(GoogleCloudStorageFileSystem.java:514)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.create(GoogleCloudStorageFileSystem.java:243)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.createChannel(GoogleHadoopOutputStream.java:82)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:74)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:768)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1067)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1048)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:937)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:391)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:364)
at ir.fq.platform.common.hadoop.IOUtils.rename(IOUtils.java:102)
at ir.fq.platform.common.hadoop.JobUtils$$anonfun$performRenames$2$$anonfun$3$$anon$2$$anonfun$run$2.apply$mcZ$sp(JobUtils.scala:143)
at ir.fq.platform.common.hadoop.JobUtils$$anonfun$performRenames$2$$anonfun$3$$anon$2$$anonfun$run$2.apply(JobUtils.scala:143)
at ir.fq.platform.common.hadoop.JobUtils$$anonfun$performRenames$2$$anonfun$3$$anon$2$$anonfun$run$2.apply(JobUtils.scala:143)
at scala.util.Try$.apply(Try.scala:192)
at ir.fq.platform.common.hadoop.JobUtils$.logTry(JobUtils.scala:198)
at ir.fq.platform.common.hadoop.JobUtils$$anonfun$performRenames$2$$anonfun$3$$anon$2.run(JobUtils.scala:143)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-26-thread-7" #259 prio=5 os_prio=0 tid=0x00007f9373533800 nid=0x21e3 waiting on condition [0x00007f9332a32000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-26-thread-6" #258 prio=5 os_prio=0 tid=0x00007f9373532800 nid=0x21e2 waiting on condition [0x00007f93287ed000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-26-thread-5" #257 prio=5 os_prio=0 tid=0x00007f93734ee800 nid=0x21e1 waiting on condition [0x00007f933050f000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-26-thread-4" #256 prio=5 os_prio=0 tid=0x00007f93734ed800 nid=0x21e0 waiting on condition [0x00007f932bbfa000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-26-thread-3" #255 prio=5 os_prio=0 tid=0x00007f93734ec800 nid=0x21df waiting on condition [0x00007f933272f000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-26-thread-2" #254 prio=5 os_prio=0 tid=0x00007f93734ec000 nid=0x21de waiting on condition [0x00007f9331e26000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-26-thread-1" #253 prio=5 os_prio=0 tid=0x00007f93724d1000 nid=0x21dd waiting on condition [0x00007f9328bef000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"client DomainSocketWatcher" #64 daemon prio=5 os_prio=0 tid=0x00007f9372de3000 nid=0x1a31 runnable [0x00007f932b6f9000]
java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
at org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503)
at java.lang.Thread.run(Thread.java:748)
"org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" #22 daemon prio=5 os_prio=0 tid=0x00007f9371fac800 nid=0x1a08 in Object.wait() [0x00007f93381ed000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x0000000640012ec8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
- locked <0x0000000640012ec8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3694)
at java.lang.Thread.run(Thread.java:748)
"Service Thread" #17 daemon prio=9 os_prio=0 tid=0x00007f93700fd800 nid=0x1a00 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread11" #16 daemon prio=9 os_prio=0 tid=0x00007f93700fa800 nid=0x19ff waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread10" #15 daemon prio=9 os_prio=0 tid=0x00007f93700f8000 nid=0x19fe waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread9" #14 daemon prio=9 os_prio=0 tid=0x00007f93700f6800 nid=0x19fd waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread8" #13 daemon prio=9 os_prio=0 tid=0x00007f93700f4000 nid=0x19fc waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread7" #12 daemon prio=9 os_prio=0 tid=0x00007f93700f2000 nid=0x19fb waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread6" #11 daemon prio=9 os_prio=0 tid=0x00007f93700f0000 nid=0x19fa waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread5" #10 daemon prio=9 os_prio=0 tid=0x00007f93700ed800 nid=0x19f9 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread4" #9 daemon prio=9 os_prio=0 tid=0x00007f93700e3800 nid=0x19f8 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread3" #8 daemon prio=9 os_prio=0 tid=0x00007f93700e1800 nid=0x19f7 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007f93700df800 nid=0x19f6 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f93700dd800 nid=0x19f5 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f93700da800 nid=0x19f4 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f93700d9000 nid=0x19f3 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f93700a7000 nid=0x19f2 in Object.wait() [0x00007f9340229000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
- locked <0x000000064001e430> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f93700a4800 nid=0x19f1 in Object.wait() [0x00007f934032a000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
- locked <0x000000064001e5e8> (a java.lang.ref.Reference$Lock)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
"main" #1 prio=5 os_prio=0 tid=0x00007f937001b000 nid=0x19e2 waiting on condition [0x00007f9379214000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000065189cde8> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at ir.fq.platform.common.hadoop.JobUtils$.performRenames(JobUtils.scala:152)
at ir.fq.platform.common.hadoop.JobUtils$.commitOutput(JobUtils.scala:54)
at ir.fq.platform.jobs.BatchJobBase.commit(BatchJobBase.scala:112)
at ir.fq.platform.jobs.BatchJobBase$$anonfun$startJob$1.apply$mcV$sp(BatchJobBase.scala:173)
at ir.fq.platform.jobs.BatchJobBase$$anonfun$startJob$1.apply(BatchJobBase.scala:173)
at ir.fq.platform.jobs.BatchJobBase$$anonfun$startJob$1.apply(BatchJobBase.scala:173)
at ir.fq.platform.jobs.package$$anon$3.get(package.scala:127)
at net.jodah.failsafe.Functions.lambda$resultSupplierOf$11(Functions.java:283)
at net.jodah.failsafe.Functions$$Lambda$110/1341083542.get(Unknown Source)
at net.jodah.failsafe.internal.executor.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:67)
at net.jodah.failsafe.internal.executor.RetryPolicyExecutor$$Lambda$111/1027296777.get(Unknown Source)
at net.jodah.failsafe.Execution.executeSync(Execution.java:117)
at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:319)
at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:71)
at ir.fq.platform.jobs.package$.retryOn(package.scala:126)
at ir.fq.platform.jobs.BatchJobBase.startJob(BatchJobBase.scala:172)
at ir.fq.platform.jobs.BatchJobBase.main(BatchJobBase.scala:210)
at ir.fq.platform.jobs.enrich.BatchPixelEnricherJob.main(BatchPixelEnricherJob.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
"VM Thread" os_prio=0 tid=0x00007f937009a800 nid=0x19f0 runnable
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f9370030000 nid=0x19e3 runnable
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f9370031800 nid=0x19e4 runnable
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007f9370033800 nid=0x19e5 runnable
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007f9370035000 nid=0x19e6 runnable
"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007f9370037000 nid=0x19e7 runnable
"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007f9370038800 nid=0x19e8 runnable
"GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007f937003a800 nid=0x19e9 runnable
"GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007f937003c000 nid=0x19ea runnable
"GC task thread#8 (ParallelGC)" os_prio=0 tid=0x00007f937003e000 nid=0x19eb runnable
"GC task thread#9 (ParallelGC)" os_prio=0 tid=0x00007f937003f800 nid=0x19ec runnable
"GC task thread#10 (ParallelGC)" os_prio=0 tid=0x00007f9370041800 nid=0x19ed runnable
"GC task thread#11 (ParallelGC)" os_prio=0 tid=0x00007f9370043000 nid=0x19ee runnable
"GC task thread#12 (ParallelGC)" os_prio=0 tid=0x00007f9370045000 nid=0x19ef runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00007f9370100000 nid=0x1a01 waiting on condition
JNI global references: 429
Heap
PSYoungGen total 1398272K, used 321216K [0x0000000740000000, 0x00000007c0000000, 0x00000007c0000000)
eden space 699392K, 45% used [0x0000000740000000,0x00000007539b0198,0x000000076ab00000)
from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
to space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
ParOldGen total 3855872K, used 980697K [0x0000000640000000, 0x000000072b580000, 0x0000000740000000)
object space 3855872K, 25% used [0x0000000640000000,0x000000067bdb6758,0x000000072b580000)
Metaspace used 82718K, capacity 83852K, committed 84096K, reserved 1122304K
class space used 10818K, capacity 11086K, committed 11136K, reserved 1048576K
Issue Analytics
- State:
- Created 5 years ago
- Comments:17 (8 by maintainers)
Not from GCS connector side, this is a Conscrypt bug, please rise this issue in Conscrypt repository.
For reference, GCS connector uses Conscrypt 1.4.2 (upgrade to newer versions is blocked on https://github.com/google/conscrypt/issues/834)
Any news on this?
I think we’re having a similar issue.
It is stuck there for a few hours already. Normally this stage takes less than a minute.
This is running on dataproc 1.4.27-debian9