GcsSession.readRaw Does Not Play Nice With MessageSourceMutator afterReceive (for deleting) When Streaming
See original GitHub issuePlease bear with me on this one.
I’m creating a supplier stream-application
for spring-cloud-dataflow
that picks up files from a GCS bucket, and emits the data contained therein. The supplier function is built almost verbatim from the existing sftp stream-application
. When the file is received, there is a org.springframework.integration.aop.MessageSourceMutator
hook that will remove the file (FileDeletingMessageAdvice
), then a subsequent step that reads the channel and emits the data. Most everything works correctly with spring-integration
except for the following:
- The repo is missing several files for full integration that I had to supplement (mostly dsl files like GcsInboundChannelAdapterSpec, GcsOutboundGatewaySpec, GcsStreamingInboundChannelAdapterSpec, and a class to return the adapters). I can submit a PR/MR if you want.
- When Streaming, the GcsSession.readRaw returns a ChannelnputStream with a BlobReadChannel. When the
afterReceive
function of theMessageSourceMutator
runs it removes the file. Any subsequent attempts to read from the InputStream after this moment return a 404 error (since the file is no longer there). Same thing happens during a rename.
2022-11-01 12:44:47.458 ERROR 65889 --- [oundedElastic-2] o.s.integration.handler.LoggingHandler : org.springframework.messaging.MessageHandlingException: IOException while iterating; nested exception is java.io.IOException: com.google.cloud.RetryHelper$RetryHelperException: com.google.cloud.storage.StorageException: 404 Not Found
GET https://storage.googleapis.com/download/storage/v1/b/<bucket-name>/o/<filename>?alt=media
No such object: <bucket-name>/<filename>, failedMessage=GenericMessage [payload=sun.nio.ch.ChannelInputStream@1d6c033b, headers={file_remoteHostPort=storage.googleapis.com:443, file_remoteFileInfo={"directory":false,"filename":"<filename>","link":false,"modified":1667312728832,"permissions":"Use [BlobInfo.getAcl()] to obtain permissions.","remoteDirectory":"<bucket-name>","size":149619}, file_remoteDirectory=<bucket-name>, id=<id>, contentType=text/plain, closeableResource=org.springframework.integration.file.remote.session.CachingSessionFactory$CachedSession@6e67b353, file_remoteFile=<filename>, timestamp=1667321087373}]
at org.springframework.integration.file.splitter.FileSplitter$FileIterator.hasNext(FileSplitter.java:344)
at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:133)
<snip>
Caused by: java.io.IOException: com.google.cloud.RetryHelper$RetryHelperException: com.google.cloud.storage.StorageException: 404 Not Found
GET https://storage.googleapis.com/download/storage/v1/b/<bucket-name>/o/<filename>?alt=media
No such object: <bucket-name>/<filename>
at com.google.cloud.storage.BlobReadChannel.read(BlobReadChannel.java:149)
at java.base/sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
at java.base/sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:107)
at java.base/sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:101)
at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:270)
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:313)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:188)
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:177)
at java.base/java.io.BufferedReader.fill(BufferedReader.java:162)
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:329)
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:396)
at org.springframework.integration.file.splitter.FileSplitter$FileIterator.hasNextLine(FileSplitter.java:350)
at org.springframework.integration.file.splitter.FileSplitter$FileIterator.hasNext(FileSplitter.java:334)
... 74 more
GcsSession
Code:
@Override
public InputStream readRaw(String source) throws IOException {
String[] tokens = getBucketAndObjectFromPath(source);
Assert.state(tokens.length == 2, "Can only write to files, not buckets.");
return Channels.newInputStream(this.gcs.reader(tokens[0], tokens[1]));
}
I presume that it’s related to the nature of the InputStream
being used, and when the data is made available to the stream. It seems that with sun.nio.ch.ChannelInputStream
the data might not be available until the first read, whereas with Apache’s SftpInputStreamAsync : InputStreamWithChannel
it is.
In summary, if the streaming IntegrationFlow
where the file is deleted during afterReceive
org.springframework.integration.aop.MessageSourceMutator
is ever going to work, then a different InputStream
or manipulation of the InputStream
will need to occur. If it’s a coincidence that the sftp stream-application
works, then that’s bad.
Issue Analytics
- State:
- Created a year ago
- Comments:11 (5 by maintainers)
Top GitHub Comments
You know that code:
Gives me a hint that
Closeable
we provide in a header could be not just anInputStream
, but some wrapper which can perform a delete operation as well. TheAbstractRemoteFileStreamingMessageSource
may be supplied with aboolean deleteRemoteFile
which would lead to an extra logic.Probably need to sleep with this idea.
Feel free to transfer this issue into https://github.com/spring-projects/spring-integration/issues since I believe there is nothing we can do in Spring Cloud GCP.
@artembilan Thanks for your input on this. I was hoping it was not a coincidence it worked with ftp/sftp, and that it may have been an implementation issue here.
There are definitely 2 approaches to getting the desired behaviour:
InputStream
before the delete or rename happens.InputStream
is closed. This seems like the correct approach.For the first approach, I was able to keep the existing
MessagingAdvice
, and add a new one, ensuring it fires first. TheafterReceive
method duplicates theStream
.In the second approach, the
InputStream
is closed in theFileSplitter
. The delete or rename information would have to be provided to theFileSplitter
, and the decision made in the close method. I suspect it would be around here