[BUG] DataLakeFileAsyncClient upload() method has memory leak causing file to be read into memory - causes OOM
See original GitHub issueDescribe the bug When using the DataLakeFileAsyncClient on very large files (from 10GB and up), the JVM dies on OutOfMemoryError as there is a memory leak causing the file to be read into memory.
Exception or Stack Trace
Error was received while reading the incoming data. The connection will be closed. java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.springframework.core.io.buffer.DefaultDataBufferFactory.allocateBuffer(DefaultDataBufferFactory.java:89) at org.springframework.core.io.buffer.DefaultDataBufferFactory.allocateBuffer(DefaultDataBufferFactory.java:32) at org.springframework.core.io.buffer.DataBufferUtils$ReadableByteChannelGenerator.accept(DataBufferUtils.java:637) at org.springframework.core.io.buffer.DataBufferUtils$ReadableByteChannelGenerator.accept(DataBufferUtils.java:618) at reactor.core.publisher.FluxGenerate.lambda$new$1(FluxGenerate.java:56) at reactor.core.publisher.FluxGenerate$$Lambda$263/1703559123.apply(Unknown Source) at reactor.core.publisher.FluxGenerate$GenerateSubscription.fastPath(FluxGenerate.java:223) at reactor.core.publisher.FluxGenerate$GenerateSubscription.request(FluxGenerate.java:202) at reactor.core.publisher.FluxUsing$UsingFuseableSubscriber.request(FluxUsing.java:317) at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.request(FluxDoFinally.java:150) at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.request(FluxMapFuseable.java:346) at reactor.core.publisher.FluxFilterFuseable$FilterFuseableSubscriber.request(FluxFilterFuseable.java:184) at reactor.core.publisher.FluxWindowPredicate$WindowPredicateMain.onSubscribe(FluxWindowPredicate.java:180) at reactor.core.publisher.FluxFilterFuseable$FilterFuseableSubscriber.onSubscribe(FluxFilterFuseable.java:81) at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onSubscribe(FluxMapFuseable.java:255) at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onSubscribe(FluxDoFinally.java:117) at reactor.core.publisher.FluxUsing$UsingFuseableSubscriber.onSubscribe(FluxUsing.java:344) at reactor.core.publisher.FluxGenerate.subscribe(FluxGenerate.java:83) at reactor.core.publisher.FluxUsing.subscribe(FluxUsing.java:102) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:153) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.ignoreDone(MonoIgnoreThen.java:190) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreInner.onComplete(MonoIgnoreThen.java:240) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1706) at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:241) at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:121) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1705) at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:241) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1705) at reactor.core.publisher.MonoIgnoreThen$ThenAcceptInner.onNext(MonoIgnoreThen.java:296) [reactor-http-epoll-2] WARN io.netty.channel.AbstractChannelHandlerContext - An exception 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.springframework.core.io.buffer.DefaultDataBufferFactory.allocateBuffer(DefaultDataBufferFactory.java:89) at org.springframework.core.io.buffer.DefaultDataBufferFactory.allocateBuffer(DefaultDataBufferFactory.java:32) at org.springframework.core.io.buffer.DataBufferUtils$ReadableByteChannelGenerator.accept(DataBufferUtils.java:637) at org.springframework.core.io.buffer.DataBufferUtils$ReadableByteChannelGenerator.accept(DataBufferUtils.java:618) at reactor.core.publisher.FluxGenerate.lambda$new$1(FluxGenerate.java:56) at reactor.core.publisher.FluxGenerate$$Lambda$263/1703559123.apply(Unknown Source) at reactor.core.publisher.FluxGenerate$GenerateSubscription.fastPath(FluxGenerate.java:223) at reactor.core.publisher.FluxGenerate$GenerateSubscription.request(FluxGenerate.java:202) at reactor.core.publisher.FluxUsing$UsingFuseableSubscriber.request(FluxUsing.java:317) at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.request(FluxDoFinally.java:150) at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.request(FluxMapFuseable.java:346) at reactor.core.publisher.FluxFilterFuseable$FilterFuseableSubscriber.request(FluxFilterFuseable.java:184) at reactor.core.publisher.FluxWindowPredicate$WindowPredicateMain.onSubscribe(FluxWindowPredicate.java:180) at reactor.core.publisher.FluxFilterFuseable$FilterFuseableSubscriber.onSubscribe(FluxFilterFuseable.java:81) at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onSubscribe(FluxMapFuseable.java:255) at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onSubscribe(FluxDoFinally.java:117) at reactor.core.publisher.FluxUsing$UsingFuseableSubscriber.onSubscribe(FluxUsing.java:344) at reactor.core.publisher.FluxGenerate.subscribe(FluxGenerate.java:83) at reactor.core.publisher.FluxUsing.subscribe(FluxUsing.java:102) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:153) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.ignoreDone(MonoIgnoreThen.java:190) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreInner.onComplete(MonoIgnoreThen.java:240) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1706) at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:241) at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:121) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1705) at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:241) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1705) at reactor.core.publisher.MonoIgnoreThen$ThenAcceptInner.onNext(MonoIgnoreThen.java:296)
To Reproduce Attempt to upload any file large enough to fill up the memory on the machine
Code Snippet ` DataLakeFileAsyncClient fileAsyncClient = fileSystemClient.getFileAsyncClient(targetPath); InputStream is = Files.newInputStream(path, READ);
Flux<DataBuffer> fileBuffer = DataBufferUtils.readInputStream(
() -> is,
new DefaultDataBufferFactory(), 1024*1024).doFinally(__ -> {
try {
is.close();
} catch (Exception e) {
LOG.error("Exception attempting to close stream ", e);
}
});
Flux<ByteBuffer> byteBufferFlux = fileBuffer.map(DataBuffer::asByteBuffer);
fileAsyncClient.exists()
.flatMap(b -> {
if (b) {
return fileAsyncClient.delete();
} else {
return Mono.empty();
}
})
.block();
fileAsyncClient.upload(byteBufferFlux, PARALLEL_TRANSFER_OPTIONS, false)
.doOnError(Throwable::printStackTrace)
.block();`
Expected behavior The expected result is that the upload method would do “lazy” buffering, i.e. reading as needed uploading/appending chunks to the object storage.
Setup (please complete the following information):
- OS: Tested on both Windows and Linux
- IDE : Eclipse and IntelliJ
- Version of the Library used:
<dependency> <groupId>com.azure</groupId> <artifactId>azure-storage-file-datalake</artifactId> <version>12.1.0</version> </dependency>
Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
- Bug Description Added
- Repro Steps Added
- Setup information Added
Issue Analytics
- State:
- Created 3 years ago
- Comments:25 (7 by maintainers)
@rickle-msft Hi think we figured it out, by accident we had used the .dfs endpoint instead of the .blob endpoint when testing with the AppendBlobClient.
We changed to .blob and also switched to BlockBlobClient, and then the copy of InputStream to OutputStream seems to work.
I am going to close this issue as the fix has been merged into master and we have had reports of success when using the new beta version.