question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GcsSession.readRaw Does Not Play Nice With MessageSourceMutator afterReceive (for deleting) When Streaming

See original GitHub issue

Please bear with me on this one.

I’m creating a supplier stream-application for spring-cloud-dataflow that picks up files from a GCS bucket, and emits the data contained therein. The supplier function is built almost verbatim from the existing sftp stream-application. When the file is received, there is a org.springframework.integration.aop.MessageSourceMutator hook that will remove the file (FileDeletingMessageAdvice), then a subsequent step that reads the channel and emits the data. Most everything works correctly with spring-integration except for the following:

  1. The repo is missing several files for full integration that I had to supplement (mostly dsl files like GcsInboundChannelAdapterSpec, GcsOutboundGatewaySpec, GcsStreamingInboundChannelAdapterSpec, and a class to return the adapters). I can submit a PR/MR if you want.
  2. When Streaming, the GcsSession.readRaw returns a ChannelnputStream with a BlobReadChannel. When the afterReceive function of the MessageSourceMutator runs it removes the file. Any subsequent attempts to read from the InputStream after this moment return a 404 error (since the file is no longer there). Same thing happens during a rename.
2022-11-01 12:44:47.458 ERROR 65889 --- [oundedElastic-2] o.s.integration.handler.LoggingHandler   : org.springframework.messaging.MessageHandlingException: IOException while iterating; nested exception is java.io.IOException: com.google.cloud.RetryHelper$RetryHelperException: com.google.cloud.storage.StorageException: 404 Not Found
GET https://storage.googleapis.com/download/storage/v1/b/<bucket-name>/o/<filename>?alt=media
No such object: <bucket-name>/<filename>, failedMessage=GenericMessage [payload=sun.nio.ch.ChannelInputStream@1d6c033b, headers={file_remoteHostPort=storage.googleapis.com:443, file_remoteFileInfo={"directory":false,"filename":"<filename>","link":false,"modified":1667312728832,"permissions":"Use [BlobInfo.getAcl()] to obtain permissions.","remoteDirectory":"<bucket-name>","size":149619}, file_remoteDirectory=<bucket-name>, id=<id>, contentType=text/plain, closeableResource=org.springframework.integration.file.remote.session.CachingSessionFactory$CachedSession@6e67b353, file_remoteFile=<filename>, timestamp=1667321087373}]
	at org.springframework.integration.file.splitter.FileSplitter$FileIterator.hasNext(FileSplitter.java:344)
	at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:133)
<snip>
Caused by: java.io.IOException: com.google.cloud.RetryHelper$RetryHelperException: com.google.cloud.storage.StorageException: 404 Not Found
GET https://storage.googleapis.com/download/storage/v1/b/<bucket-name>/o/<filename>?alt=media
No such object: <bucket-name>/<filename>
	at com.google.cloud.storage.BlobReadChannel.read(BlobReadChannel.java:149)
	at java.base/sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
	at java.base/sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:107)
	at java.base/sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:101)
	at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:270)
	at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:313)
	at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:188)
	at java.base/java.io.InputStreamReader.read(InputStreamReader.java:177)
	at java.base/java.io.BufferedReader.fill(BufferedReader.java:162)
	at java.base/java.io.BufferedReader.readLine(BufferedReader.java:329)
	at java.base/java.io.BufferedReader.readLine(BufferedReader.java:396)
	at org.springframework.integration.file.splitter.FileSplitter$FileIterator.hasNextLine(FileSplitter.java:350)
	at org.springframework.integration.file.splitter.FileSplitter$FileIterator.hasNext(FileSplitter.java:334)
	... 74 more

GcsSession Code:

  @Override
  public InputStream readRaw(String source) throws IOException {
    String[] tokens = getBucketAndObjectFromPath(source);
    Assert.state(tokens.length == 2, "Can only write to files, not buckets.");
    return Channels.newInputStream(this.gcs.reader(tokens[0], tokens[1]));
  }

I presume that it’s related to the nature of the InputStream being used, and when the data is made available to the stream. It seems that with sun.nio.ch.ChannelInputStream the data might not be available until the first read, whereas with Apache’s SftpInputStreamAsync : InputStreamWithChannel it is.

In summary, if the streaming IntegrationFlow where the file is deleted during afterReceive org.springframework.integration.aop.MessageSourceMutator is ever going to work, then a different InputStream or manipulation of the InputStream will need to occur. If it’s a coincidence that the sftp stream-application works, then that’s bad.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
artembilancommented, Nov 3, 2022

You know that code:

					Closeable closeableResource = StaticMessageHeaderAccessor.getCloseableResource(message);
					if (closeableResource != null) {
						closeableResource.close();
					}

Gives me a hint that Closeable we provide in a header could be not just an InputStream, but some wrapper which can perform a delete operation as well. The AbstractRemoteFileStreamingMessageSource may be supplied with a boolean deleteRemoteFile which would lead to an extra logic.

Probably need to sleep with this idea.

Feel free to transfer this issue into https://github.com/spring-projects/spring-integration/issues since I believe there is nothing we can do in Spring Cloud GCP.

1reaction
fiidimcommented, Nov 3, 2022

@artembilan Thanks for your input on this. I was hoping it was not a coincidence it worked with ftp/sftp, and that it may have been an implementation issue here.

There are definitely 2 approaches to getting the desired behaviour:

  1. Duplicate the InputStream before the delete or rename happens.
  2. Move where the delete/rename happens until after the InputStream is closed. This seems like the correct approach.

For the first approach, I was able to keep the existing MessagingAdvice, and add a new one, ensuring it fires first. The afterReceive method duplicates the Stream.

public class RemoteFileCopyStreamAdvice implements MessageSourceMutator {
	@Nullable
	@Override
	public Message<?> afterReceive(@Nullable Message<?> result, final MessageSource<?> source) {
		return result == null || !(result.getPayload() instanceof InputStream) ?
				result : convertStream(result);
	}

	public Message<?> convertStream(Message<?> result) {
		try {
			return MessageBuilder.withPayload(IOUtils.toBufferedInputStream((InputStream) result.getPayload()))
								.copyHeaders(result.getHeaders()).build();
		}
		catch (IOException e) {
		}
		return result;
	}
}

In the second approach, the InputStream is closed in the FileSplitter. The delete or rename information would have to be provided to the FileSplitter, and the decision made in the close method. I suspect it would be around here

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found