question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GoogleCloudStorageFileSystem.create may block indefinitely (potential deadlock in BatchHelper)

See original GitHub issue

We run hourly jobs in dataproc which commit output by moving files from a staging directory in hdfs (mounted on local SSDs) to a bucket in gcs. On occasion (roughly once a week), we observe a job that appears to be “stuck” indefinitely while committing output; this process takes ~1.5 minutes on average, but during these anomalous runs, it can take hours; two days ago it took over 9 hours.

Finally, I was able to capture the stacktrace by logging into the instance ~20 minutes after it got “stuck” committing (i.e., moving files). On close examination this appears to be a deadlock caused by the (now) daemon thread raised in this issue: https://github.com/GoogleCloudPlatform/bigdata-interop/issues/150.

Below is the full stacktrace. In main thread, commitOutput triggers the process described above. Several threads are dispatched to perform concurrent renames. In pool-26-thread-8, we can see GoogleCloudStorageFileSystem.create on the stack as a result of FileUtil.copy; it’s awaiting completion of the batch as a result of BatchHelper.awaitRequestsCompletion which in turn appears to be deadlocked owing to BatchHelper.execute in gcsfs-batch-helper-3912 (daemon) thread.

Full thread dump OpenJDK 64-Bit Server VM (25.181-b13 mixed mode):

"gcsfs-batch-helper-3912" #4130 daemon prio=5 os_prio=0 tid=0x00007f92a4040800 nid=0x3130 runnable [0x00007f9330815000]
   java.lang.Thread.State: RUNNABLE
	at org.conscrypt.NativeCrypto.SSL_do_handshake(Native Method)
	at org.conscrypt.NativeSsl.doHandshake(NativeSsl.java:392)
	at org.conscrypt.ConscryptFileDescriptorSocket.startHandshake(ConscryptFileDescriptorSocket.java:225)
	at org.conscrypt.ConscryptFileDescriptorSocket.waitForHandshake(ConscryptFileDescriptorSocket.java:474)
	at org.conscrypt.ConscryptFileDescriptorSocket.getOutputStream(ConscryptFileDescriptorSocket.java:461)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:465)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
	- locked <0x0000000651a1d8a8> (a sun.net.www.protocol.https.HttpsClient)
	at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
	at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:162)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:104)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.execute(BatchHelper.java:175)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.lambda$queue$0(BatchHelper.java:163)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper$$Lambda$96/631118845.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"org.apache.hadoop.hdfs.PeerCache@65577e18" #590 daemon prio=5 os_prio=0 tid=0x00007f91fc007800 nid=0x2335 waiting on condition [0x00007f91bcefa000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:253)
	at org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:46)
	at org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:124)
	at java.lang.Thread.run(Thread.java:748)

"pool-26-thread-16" #268 prio=5 os_prio=0 tid=0x00007f937353d000 nid=0x21ec waiting on condition [0x00007f9322efc000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"pool-26-thread-15" #267 prio=5 os_prio=0 tid=0x00007f937353c000 nid=0x21eb waiting on condition [0x00007f9322cfa000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"pool-26-thread-14" #266 prio=5 os_prio=0 tid=0x00007f937353b000 nid=0x21ea waiting on condition [0x00007f9322bf9000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"pool-26-thread-13" #265 prio=5 os_prio=0 tid=0x00007f937353a000 nid=0x21e9 waiting on condition [0x00007f93286ec000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"pool-26-thread-12" #264 prio=5 os_prio=0 tid=0x00007f9373539000 nid=0x21e8 waiting on condition [0x00007f9332931000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"pool-26-thread-11" #263 prio=5 os_prio=0 tid=0x00007f9373538000 nid=0x21e7 waiting on condition [0x00007f933222a000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"pool-26-thread-10" #262 prio=5 os_prio=0 tid=0x00007f9373536800 nid=0x21e6 waiting on condition [0x00007f9332129000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"pool-26-thread-9" #261 prio=5 os_prio=0 tid=0x00007f9373535800 nid=0x21e5 waiting on condition [0x00007f933262e000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"pool-26-thread-8" #260 prio=5 os_prio=0 tid=0x00007f9373534800 nid=0x21e4 waiting on condition [0x00007f9331d24000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000651a1dbd8> (a java.util.concurrent.FutureTask)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
	at java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.awaitRequestsCompletion(BatchHelper.java:263)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.flushIfPossible(BatchHelper.java:204)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.BatchHelper.flush(BatchHelper.java:235)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfos(GoogleCloudStorageImpl.java:1723)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfos(GoogleCloudStorageFileSystem.java:1211)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.mkdirs(GoogleCloudStorageFileSystem.java:514)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.create(GoogleCloudStorageFileSystem.java:243)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.createChannel(GoogleHadoopOutputStream.java:82)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:74)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:768)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1067)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1048)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:937)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:391)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:364)
	at ir.fq.platform.common.hadoop.IOUtils.rename(IOUtils.java:102)
	at ir.fq.platform.common.hadoop.JobUtils$$anonfun$performRenames$2$$anonfun$3$$anon$2$$anonfun$run$2.apply$mcZ$sp(JobUtils.scala:143)
	at ir.fq.platform.common.hadoop.JobUtils$$anonfun$performRenames$2$$anonfun$3$$anon$2$$anonfun$run$2.apply(JobUtils.scala:143)
	at ir.fq.platform.common.hadoop.JobUtils$$anonfun$performRenames$2$$anonfun$3$$anon$2$$anonfun$run$2.apply(JobUtils.scala:143)
	at scala.util.Try$.apply(Try.scala:192)
	at ir.fq.platform.common.hadoop.JobUtils$.logTry(JobUtils.scala:198)
	at ir.fq.platform.common.hadoop.JobUtils$$anonfun$performRenames$2$$anonfun$3$$anon$2.run(JobUtils.scala:143)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"pool-26-thread-7" #259 prio=5 os_prio=0 tid=0x00007f9373533800 nid=0x21e3 waiting on condition [0x00007f9332a32000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"pool-26-thread-6" #258 prio=5 os_prio=0 tid=0x00007f9373532800 nid=0x21e2 waiting on condition [0x00007f93287ed000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"pool-26-thread-5" #257 prio=5 os_prio=0 tid=0x00007f93734ee800 nid=0x21e1 waiting on condition [0x00007f933050f000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"pool-26-thread-4" #256 prio=5 os_prio=0 tid=0x00007f93734ed800 nid=0x21e0 waiting on condition [0x00007f932bbfa000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"pool-26-thread-3" #255 prio=5 os_prio=0 tid=0x00007f93734ec800 nid=0x21df waiting on condition [0x00007f933272f000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"pool-26-thread-2" #254 prio=5 os_prio=0 tid=0x00007f93734ec000 nid=0x21de waiting on condition [0x00007f9331e26000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"pool-26-thread-1" #253 prio=5 os_prio=0 tid=0x00007f93724d1000 nid=0x21dd waiting on condition [0x00007f9328bef000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000651a01a30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

"client DomainSocketWatcher" #64 daemon prio=5 os_prio=0 tid=0x00007f9372de3000 nid=0x1a31 runnable [0x00007f932b6f9000]
   java.lang.Thread.State: RUNNABLE
	at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
	at org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
	at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503)
	at java.lang.Thread.run(Thread.java:748)

"org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" #22 daemon prio=5 os_prio=0 tid=0x00007f9371fac800 nid=0x1a08 in Object.wait() [0x00007f93381ed000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x0000000640012ec8> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
	- locked <0x0000000640012ec8> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
	at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3694)
	at java.lang.Thread.run(Thread.java:748)

"Service Thread" #17 daemon prio=9 os_prio=0 tid=0x00007f93700fd800 nid=0x1a00 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread11" #16 daemon prio=9 os_prio=0 tid=0x00007f93700fa800 nid=0x19ff waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread10" #15 daemon prio=9 os_prio=0 tid=0x00007f93700f8000 nid=0x19fe waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread9" #14 daemon prio=9 os_prio=0 tid=0x00007f93700f6800 nid=0x19fd waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread8" #13 daemon prio=9 os_prio=0 tid=0x00007f93700f4000 nid=0x19fc waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread7" #12 daemon prio=9 os_prio=0 tid=0x00007f93700f2000 nid=0x19fb waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread6" #11 daemon prio=9 os_prio=0 tid=0x00007f93700f0000 nid=0x19fa waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread5" #10 daemon prio=9 os_prio=0 tid=0x00007f93700ed800 nid=0x19f9 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread4" #9 daemon prio=9 os_prio=0 tid=0x00007f93700e3800 nid=0x19f8 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread3" #8 daemon prio=9 os_prio=0 tid=0x00007f93700e1800 nid=0x19f7 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007f93700df800 nid=0x19f6 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f93700dd800 nid=0x19f5 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f93700da800 nid=0x19f4 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f93700d9000 nid=0x19f3 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f93700a7000 nid=0x19f2 in Object.wait() [0x00007f9340229000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
	- locked <0x000000064001e430> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f93700a4800 nid=0x19f1 in Object.wait() [0x00007f934032a000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:502)
	at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
	- locked <0x000000064001e5e8> (a java.lang.ref.Reference$Lock)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

"main" #1 prio=5 os_prio=0 tid=0x00007f937001b000 nid=0x19e2 waiting on condition [0x00007f9379214000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000065189cde8> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
	at ir.fq.platform.common.hadoop.JobUtils$.performRenames(JobUtils.scala:152)
	at ir.fq.platform.common.hadoop.JobUtils$.commitOutput(JobUtils.scala:54)
	at ir.fq.platform.jobs.BatchJobBase.commit(BatchJobBase.scala:112)
	at ir.fq.platform.jobs.BatchJobBase$$anonfun$startJob$1.apply$mcV$sp(BatchJobBase.scala:173)
	at ir.fq.platform.jobs.BatchJobBase$$anonfun$startJob$1.apply(BatchJobBase.scala:173)
	at ir.fq.platform.jobs.BatchJobBase$$anonfun$startJob$1.apply(BatchJobBase.scala:173)
	at ir.fq.platform.jobs.package$$anon$3.get(package.scala:127)
	at net.jodah.failsafe.Functions.lambda$resultSupplierOf$11(Functions.java:283)
	at net.jodah.failsafe.Functions$$Lambda$110/1341083542.get(Unknown Source)
	at net.jodah.failsafe.internal.executor.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:67)
	at net.jodah.failsafe.internal.executor.RetryPolicyExecutor$$Lambda$111/1027296777.get(Unknown Source)
	at net.jodah.failsafe.Execution.executeSync(Execution.java:117)
	at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:319)
	at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:71)
	at ir.fq.platform.jobs.package$.retryOn(package.scala:126)
	at ir.fq.platform.jobs.BatchJobBase.startJob(BatchJobBase.scala:172)
	at ir.fq.platform.jobs.BatchJobBase.main(BatchJobBase.scala:210)
	at ir.fq.platform.jobs.enrich.BatchPixelEnricherJob.main(BatchPixelEnricherJob.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

"VM Thread" os_prio=0 tid=0x00007f937009a800 nid=0x19f0 runnable 

"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f9370030000 nid=0x19e3 runnable 

"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f9370031800 nid=0x19e4 runnable 

"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007f9370033800 nid=0x19e5 runnable 

"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007f9370035000 nid=0x19e6 runnable 

"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007f9370037000 nid=0x19e7 runnable 

"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007f9370038800 nid=0x19e8 runnable 

"GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007f937003a800 nid=0x19e9 runnable 

"GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007f937003c000 nid=0x19ea runnable 

"GC task thread#8 (ParallelGC)" os_prio=0 tid=0x00007f937003e000 nid=0x19eb runnable 

"GC task thread#9 (ParallelGC)" os_prio=0 tid=0x00007f937003f800 nid=0x19ec runnable 

"GC task thread#10 (ParallelGC)" os_prio=0 tid=0x00007f9370041800 nid=0x19ed runnable 

"GC task thread#11 (ParallelGC)" os_prio=0 tid=0x00007f9370043000 nid=0x19ee runnable 

"GC task thread#12 (ParallelGC)" os_prio=0 tid=0x00007f9370045000 nid=0x19ef runnable 

"VM Periodic Task Thread" os_prio=0 tid=0x00007f9370100000 nid=0x1a01 waiting on condition 

JNI global references: 429

Heap
 PSYoungGen      total 1398272K, used 321216K [0x0000000740000000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 699392K, 45% used [0x0000000740000000,0x00000007539b0198,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 3855872K, used 980697K [0x0000000640000000, 0x000000072b580000, 0x0000000740000000)
  object space 3855872K, 25% used [0x0000000640000000,0x000000067bdb6758,0x000000072b580000)
 Metaspace       used 82718K, capacity 83852K, committed 84096K, reserved 1122304K
  class space    used 10818K, capacity 11086K, committed 11136K, reserved 1048576K

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:17 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
medbcommented, May 25, 2020

Not from GCS connector side, this is a Conscrypt bug, please rise this issue in Conscrypt repository.

For reference, GCS connector uses Conscrypt 1.4.2 (upgrade to newer versions is blocked on https://github.com/google/conscrypt/issues/834)

0reactions
tabordacommented, May 25, 2020

Any news on this?

I think we’re having a similar issue.

"gcs-async-channel-pool-2" #988 daemon prio=5 os_prio=0 tid=0x00007fb3201a7000 nid=0x63c5 runnable [0x00007fb2e15f1000]
   java.lang.Thread.State: RUNNABLE
	at org.conscrypt.NativeCrypto.SSL_do_handshake(Native Method)
	at org.conscrypt.NativeSsl.doHandshake(NativeSsl.java:385)
	at org.conscrypt.ConscryptFileDescriptorSocket.startHandshake(ConscryptFileDescriptorSocket.java:225)
	at org.conscrypt.ConscryptFileDescriptorSocket.waitForHandshake(ConscryptFileDescriptorSocket.java:475)
	at org.conscrypt.ConscryptFileDescriptorSocket.getOutputStream(ConscryptFileDescriptorSocket.java:462)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:465)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
	- locked <0x0000000737c30180> (a sun.net.www.protocol.https.HttpsClient)
	at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
	at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
	at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1340)
	- locked <0x0000000737c30298> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
	at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1315)
	- locked <0x0000000737c30298> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:264)
	- locked <0x0000000737c30380> (a sun.net.www.protocol.https.HttpsURLConnectionImpl)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:113)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:84)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1011)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequestWithoutGZip(MediaHttpUploader.java:548)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequest(MediaHttpUploader.java:565)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:422)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
----------------- 25541 -----------------
0x00007fb34743f8bd	__poll + 0x2d

It is stuck there for a few hours already. Normally this stage takes less than a minute.

This is running on dataproc 1.4.27-debian9

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deadlock in Java - YouTube
Deadlock is a situation that can occur when two or more threads are blocked indefinitely trying to obtain access to a resource locked...
Read more >
hadoop-connectors
GoogleCloudStorageFileSystem.create may block indefinitely (potential deadlock in BatchHelper). githubwua. githubwua CLOSED · Updated 3 years ago ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found