Pipe broken exception on write to GCS
See original GitHub issueI’m importing some data to Apache Accumulo which runs on top of Google Cloud Storage (as HDFS replacement). I use the GCS connector 1.8.1-hadoop2 and Accumulo runs in GCloud VMs.
I see the following exceptions in the logs quite frequently (the first - on GoogleHadoopOutputStream.write
, the second - on GoogleHadoopOutputStream.close
):
java.io.IOException: java.io.IOException: Pipe broken
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:432)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.write(AbstractGoogleAsyncWriteChannel.java:256)
at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
at java.nio.channels.Channels.writeFully(Channels.java:101)
at java.nio.channels.Channels.access$000(Channels.java:61)
at java.nio.channels.Channels$1.write(Channels.java:174)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.write(GoogleHadoopOutputStream.java:96)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:50)
at java.io.DataOutputStream.write(DataOutputStream.java:88)
at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
at org.apache.accumulo.tserver.logger.LogFileKey.write(LogFileKey.java:89)
at org.apache.accumulo.tserver.log.DfsLogger.write(DfsLogger.java:616)
at org.apache.accumulo.tserver.log.DfsLogger.logFileData(DfsLogger.java:633)
at org.apache.accumulo.tserver.log.DfsLogger.logManyTablets(DfsLogger.java:673)
at org.apache.accumulo.tserver.log.TabletServerLogger$7.write(TabletServerLogger.java:533)
at org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:420)
at org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:371)
at org.apache.accumulo.tserver.log.TabletServerLogger.logManyTablets(TabletServerLogger.java:523)
at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.flush(TabletServer.java:1030)
at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.closeUpdate(TabletServer.java:1118)
at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46)
at org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:83)
at com.sun.proxy.$Proxy17.closeUpdate(Unknown Source)
at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$closeUpdate.getResult(TabletClientService.java:2501)
at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$closeUpdate.getResult(TabletClientService.java:2485)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:65)
at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518)
at org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:113)
at org.apache.thrift.server.Invocation.run(Invocation.java:18)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Pipe broken
at java.io.PipedInputStream.read(PipedInputStream.java:321)
at java.io.PipedInputStream.read(PipedInputStream.java:377)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.util.ByteStreams.read(ByteStreams.java:181)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.setContentAndHeadersOnCurrentRequest(MediaHttpUploader.java:629)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:409)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:358)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
java.io.IOException: java.io.IOException: Pipe broken
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:432)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.write(AbstractGoogleAsyncWriteChannel.java:256)
at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
at java.nio.channels.Channels.writeFully(Channels.java:101)
at java.nio.channels.Channels.access$000(Channels.java:61)
at java.nio.channels.Channels$1.write(Channels.java:174)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:126)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.accumulo.tserver.log.DfsLogger.close(DfsLogger.java:592)
at org.apache.accumulo.tserver.log.TabletServerLogger.close(TabletServerLogger.java:338)
at org.apache.accumulo.tserver.log.TabletServerLogger.access$1000(TabletServerLogger.java:70)
at org.apache.accumulo.tserver.log.TabletServerLogger$3.withWriteLock(TabletServerLogger.java:455)
at org.apache.accumulo.tserver.log.TabletServerLogger.testLockAndRun(TabletServerLogger.java:137)
at org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:446)
at org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:371)
at org.apache.accumulo.tserver.log.TabletServerLogger.logManyTablets(TabletServerLogger.java:523)
at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.flush(TabletServer.java:1030)
at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.closeUpdate(TabletServer.java:1118)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46)
at org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:83)
at com.sun.proxy.$Proxy17.closeUpdate(Unknown Source)
at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$closeUpdate.getResult(TabletClientService.java:2501)
at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$closeUpdate.getResult(TabletClientService.java:2485)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:65)
at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518)
at org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:113)
at org.apache.thrift.server.Invocation.run(Invocation.java:18)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.io.IOException: java.io.IOException: Pipe broken
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:432)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:287)
at java.nio.channels.Channels$1.close(Channels.java:178)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
... 31 more
Caused by: java.io.IOException: Pipe broken
at java.io.PipedInputStream.read(PipedInputStream.java:321)
at java.io.PipedInputStream.read(PipedInputStream.java:377)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.util.ByteStreams.read(ByteStreams.java:181)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.setContentAndHeadersOnCurrentRequest(MediaHttpUploader.java:629)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:409)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:358)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
[CIRCULAR REFERENCE:java.io.IOException: Pipe broken]
Accumulo marks this exception with the ERROR level.
What could be the root cause? How to get more details about the exception (debug logs, etc.)? Thank you!
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (5 by maintainers)
Top Results From Across the Web
Pipe broken exception on write to GCS · Issue #103 - GitHub
If it writes a lot of small files at high QPS rate to GCS then it could cause GCS to drop connections which...
Read more >Broken pipe and Device or resource busy errors in a GCP ...
I am having broken pipe issues while transferring a Google drive directory into GCP Google Storage. Steps are as follows.
Read more >Re: getting broken pipe error in system.log
onExceptionWrite exception:{} java.io.IOException: Broken pipe at com.apigee.nio.channels.ClientOutputChannel.
Read more >Snapshot Recovery fails due to connection issues with GCS ...
Snapshot Recovery fails due to connection issues with GCS repository ... SocketException: Broken pipe", "\tat sun.nio.ch.NioSocketImpl.
Read more >[jira] [Created] (BEAM-8216) GCS IO fails with uninformative ...
Summary: GCS IO fails with uninformative 'Broken pipe' errors while attempting to write to a GCS bucket without proper permissions.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@medb The Apache Accumulo issue you referenced did not conclude that GCS wouldn’t, or couldn’t, be supported. That issue was closed because the question raised by the user about what was the explanation for the issue they were seeing, that question was answered.
The supported solution is to use a LogCloser configured on the user’s class path for Accumulo which will handle closing logs on GCS. I don’t know enough about GCS to know for sure, but it may be sufficient to trivially fork Accumulo’s built-in HadoopLogCloser, and do nothing instead of throwing the IllegalStateException when the FileSystem is GCS (essentially, no attempt to do lease recovery, just like in the local file system case).
I do not think that the issue has anything to do with Accumulo’s write pattern… as suggested here… at least, not if it’s the same issue as the one you referenced. It’s likely a simple matter of implementing an appropriate LogCloser.
The problem is that GCS, Azure Blob Store and AWS S3 are not file systems, but object stores and Apache Accumulo written in mind with HDFS capabilities, which could not be fully supported by object stores.
GCS connector tries to mimic HDFS semantic, but because of object stores limitations it could not do so fully.
We need to take a look into Accumulo use case to determine if it possible to make it work with GCS, but because Accumulo is not supported now by GCS connector, it’s not immediate action item for us.