question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] DataLakeFileAsyncClient upload() method has memory leak causing file to be read into memory - causes OOM

See original GitHub issue

Describe the bug When using the DataLakeFileAsyncClient on very large files (from 10GB and up), the JVM dies on OutOfMemoryError as there is a memory leak causing the file to be read into memory.

Exception or Stack Trace Error was received while reading the incoming data. The connection will be closed. java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.springframework.core.io.buffer.DefaultDataBufferFactory.allocateBuffer(DefaultDataBufferFactory.java:89) at org.springframework.core.io.buffer.DefaultDataBufferFactory.allocateBuffer(DefaultDataBufferFactory.java:32) at org.springframework.core.io.buffer.DataBufferUtils$ReadableByteChannelGenerator.accept(DataBufferUtils.java:637) at org.springframework.core.io.buffer.DataBufferUtils$ReadableByteChannelGenerator.accept(DataBufferUtils.java:618) at reactor.core.publisher.FluxGenerate.lambda$new$1(FluxGenerate.java:56) at reactor.core.publisher.FluxGenerate$$Lambda$263/1703559123.apply(Unknown Source) at reactor.core.publisher.FluxGenerate$GenerateSubscription.fastPath(FluxGenerate.java:223) at reactor.core.publisher.FluxGenerate$GenerateSubscription.request(FluxGenerate.java:202) at reactor.core.publisher.FluxUsing$UsingFuseableSubscriber.request(FluxUsing.java:317) at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.request(FluxDoFinally.java:150) at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.request(FluxMapFuseable.java:346) at reactor.core.publisher.FluxFilterFuseable$FilterFuseableSubscriber.request(FluxFilterFuseable.java:184) at reactor.core.publisher.FluxWindowPredicate$WindowPredicateMain.onSubscribe(FluxWindowPredicate.java:180) at reactor.core.publisher.FluxFilterFuseable$FilterFuseableSubscriber.onSubscribe(FluxFilterFuseable.java:81) at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onSubscribe(FluxMapFuseable.java:255) at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onSubscribe(FluxDoFinally.java:117) at reactor.core.publisher.FluxUsing$UsingFuseableSubscriber.onSubscribe(FluxUsing.java:344) at reactor.core.publisher.FluxGenerate.subscribe(FluxGenerate.java:83) at reactor.core.publisher.FluxUsing.subscribe(FluxUsing.java:102) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:153) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.ignoreDone(MonoIgnoreThen.java:190) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreInner.onComplete(MonoIgnoreThen.java:240) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1706) at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:241) at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:121) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1705) at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:241) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1705) at reactor.core.publisher.MonoIgnoreThen$ThenAcceptInner.onNext(MonoIgnoreThen.java:296) [reactor-http-epoll-2] WARN io.netty.channel.AbstractChannelHandlerContext - An exception 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.springframework.core.io.buffer.DefaultDataBufferFactory.allocateBuffer(DefaultDataBufferFactory.java:89) at org.springframework.core.io.buffer.DefaultDataBufferFactory.allocateBuffer(DefaultDataBufferFactory.java:32) at org.springframework.core.io.buffer.DataBufferUtils$ReadableByteChannelGenerator.accept(DataBufferUtils.java:637) at org.springframework.core.io.buffer.DataBufferUtils$ReadableByteChannelGenerator.accept(DataBufferUtils.java:618) at reactor.core.publisher.FluxGenerate.lambda$new$1(FluxGenerate.java:56) at reactor.core.publisher.FluxGenerate$$Lambda$263/1703559123.apply(Unknown Source) at reactor.core.publisher.FluxGenerate$GenerateSubscription.fastPath(FluxGenerate.java:223) at reactor.core.publisher.FluxGenerate$GenerateSubscription.request(FluxGenerate.java:202) at reactor.core.publisher.FluxUsing$UsingFuseableSubscriber.request(FluxUsing.java:317) at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.request(FluxDoFinally.java:150) at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.request(FluxMapFuseable.java:346) at reactor.core.publisher.FluxFilterFuseable$FilterFuseableSubscriber.request(FluxFilterFuseable.java:184) at reactor.core.publisher.FluxWindowPredicate$WindowPredicateMain.onSubscribe(FluxWindowPredicate.java:180) at reactor.core.publisher.FluxFilterFuseable$FilterFuseableSubscriber.onSubscribe(FluxFilterFuseable.java:81) at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onSubscribe(FluxMapFuseable.java:255) at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onSubscribe(FluxDoFinally.java:117) at reactor.core.publisher.FluxUsing$UsingFuseableSubscriber.onSubscribe(FluxUsing.java:344) at reactor.core.publisher.FluxGenerate.subscribe(FluxGenerate.java:83) at reactor.core.publisher.FluxUsing.subscribe(FluxUsing.java:102) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:153) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.ignoreDone(MonoIgnoreThen.java:190) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreInner.onComplete(MonoIgnoreThen.java:240) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1706) at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:241) at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:121) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1705) at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:241) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1705) at reactor.core.publisher.MonoIgnoreThen$ThenAcceptInner.onNext(MonoIgnoreThen.java:296)

To Reproduce Attempt to upload any file large enough to fill up the memory on the machine

Code Snippet ` DataLakeFileAsyncClient fileAsyncClient = fileSystemClient.getFileAsyncClient(targetPath); InputStream is = Files.newInputStream(path, READ);

    Flux<DataBuffer> fileBuffer = DataBufferUtils.readInputStream(
            () -> is,
            new DefaultDataBufferFactory(), 1024*1024).doFinally(__ -> {
        try {
            is.close();
        } catch (Exception e) {
        	LOG.error("Exception attempting to close stream ", e);
        }
    });
    Flux<ByteBuffer> byteBufferFlux = fileBuffer.map(DataBuffer::asByteBuffer);
    
    fileAsyncClient.exists()
            .flatMap(b -> {
                if (b) {
                    return fileAsyncClient.delete();
                } else {
                    return Mono.empty();
                }
            })
            .block();
    
    fileAsyncClient.upload(byteBufferFlux, PARALLEL_TRANSFER_OPTIONS, false)
    .doOnError(Throwable::printStackTrace)
    .block();`

Expected behavior The expected result is that the upload method would do “lazy” buffering, i.e. reading as needed uploading/appending chunks to the object storage.

Setup (please complete the following information):

  • OS: Tested on both Windows and Linux
  • IDE : Eclipse and IntelliJ
  • Version of the Library used: <dependency> <groupId>com.azure</groupId> <artifactId>azure-storage-file-datalake</artifactId> <version>12.1.0</version> </dependency>

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Bug Description Added
  • Repro Steps Added
  • Setup information Added

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:25 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
roneivcommented, May 11, 2020

@rickle-msft Hi think we figured it out, by accident we had used the .dfs endpoint instead of the .blob endpoint when testing with the AppendBlobClient.

We changed to .blob and also switched to BlockBlobClient, and then the copy of InputStream to OutputStream seems to work.

0reactions
rickle-msftcommented, Jun 12, 2020

I am going to close this issue as the fix has been merged into master and we have had reports of success when using the new beta version.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How we find and fix OOM and memory leaks in Java Services
1. Understanding OOM errors and identifying their causes · The application needs more memory than the OS can offer. · The Java application...
Read more >
Troubleshoot Dataflow out of memory errors - Google Cloud
This page provides information about memory usage in Dataflow pipelines and steps for investigating and resolving issues with Dataflow out of memory (OOM) ......
Read more >
Hunting Java Memory Leaks - Toptal
Java heap leaks: the classic memory leak, in which Java objects are continuously created without being released. This is usually caused by latent...
Read more >
Understanding Memory Leaks in Java - Baeldung
Memory leaks are a genuine problem in Java. In this tutorial, we'll learn what the potential causes of memory leaks are, how to...
Read more >
3.2 Understand the OutOfMemoryError Exception
Cause: The detail message Java heap space indicates object could not be allocated in the Java heap. This error does not necessarily imply...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found