question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Blob Storage] Issues with input/output stream uploads, especially large streams.

See original GitHub issue

Query/Question As just figured out in issue #5221 I am able to upload also large amounts of data from files to block blobs via the BlockBlobClient#uploadFromFile(filePath) method.

However, I am usually in need of stream operations, with which I have issues. Example: a client uploads data to our backend via octet-stream. The Spring backend already maps this to a Java InputStream which I have to channel directly to the cloud storage.

I tried the following:

  • BlockBlobClient#upload(inputStream, length): results in a RequestBodyTooLarge error when file size is over the API limit of 256MB. How can I solve this? I would have expected the SDK to do the magic like with a file upload which can already be many gigabytes with one SDK upload method call.
  • I tried writing to the output stream of a blob directly via BlockBlobClient#getBlobOutputStream(), channeling an incoming InputStream on the fly via org.springframework.util.StreamUtils#copy(inputStream, outputStream). This operation ** is very slow compared to file or input stream upload: with the latter I have speeds of 7Mb/s with my connection, the output stream write shows only ~ 140KiB/s, and ** results in completely wrong blob sizes on the storage: when I tried with a random 3Mb input stream, the blob size on the Azure portal is shown as just 4KiB. ** I am also not sure, even if it was faster and would produce a correct file size: would it work for large files?

Generally, I cannot buffer in memory or disk of the host machines due to resource constraints, this wouldn’t scale. Also, the data can be potentially large (several gigabytes): if I would buffer this in memory I’d have ‘out of memory’ exceptions for one client only quickly, and there may be a large number of parallel uploads by thousands of different clients. The same applies for disk space when using temp files to work around this, so it’s not an option.

Code snippets

public void test()
{
    final Path file = Paths.get("~/Downloads/random-file.zip");
    try(final RandomAccessFile raf = new RandomAccessFile("~/Downloads/random-file.zip", "rw"); final InputStream is = Files.newInputStream(file))
    try(final InputStream is = Files.newInputStream(path))
    {
        final long largeFileSize = 1024 * 1024 * 1024 * 10L // 10Gb of size
        final long smallFileSize = 1024 * 1024 * 3L // 3Mb of size
        raf.setLength(largeFileSize);
        //raf.setLength(smallFileSize); // -> use for trying out writing to output stream
        cloudStorageClient.upload("random-file.zip", is, largeFileSize); // -> fails, I would expect to be able to use it with any content length
        
        // writeToOutputStream(is); // -> this is very slow and produces false blob sizes.
    }
    catch (final IOException | CloudStorageException e)
    {
        LOG.error("Test failed", e);
    }
    finally
    {
        try
        {
            Files.deleteIfExists(file);
        }
        catch (final IOException e)
        {
            LOG.error("Test clean-up failed");
        }
    }
}

private void writeToOutputStream(final InputStream inputStream)
{
    try(final OutputStream outputStream = containerClient.getBlockBlobClient("random-file.zip").getBlobOutputStream())
    {
        LOG.debug("Channeling content from input stream to blob storage...");
        org.springframework.util.StreamUtils.copy(inputStream, outputStream);
    }
    catch (final IOException | StorageException e)
    {
        LOG.error("Test failed", e);
    }
}

Why is this not a Bug or a feature Request? The input stream upload seems to adhere to the API limit of a 256Mb request body, so it’s technically not a bug. But I am not sure if the SDK shouldn’t handle this properly in its implementation, which is why I am not filing a feature request for now.

The issue with writing to the output stream may be a bug, but I am not sure yet.

Setup (please complete the following information if applicable):

  • OS: Mac OS 10.14 (Mojave)
  • IDE : IntelliJ
  • 12.0.0-preview.2

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:14 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
cdraegercommented, Oct 16, 2019

Hi @rickle-msft I was now able to test writing to the output stream again with preview 4:

  1. It seems like the content length of the upload is fixed, I tested up to 50Mb random files and the blob showed the correct size in the portal afterwards.
  2. However, the upload speed was still very slow: ~ 250kB/s. In comparison, the input stream upload managed > 50Mbit/s, but it requires me to specify the content length beforehand which I don’t know when channeling streams on-the-fly. Also there is a content length limit then.

I basically copied a file InputStream to the blob OutputStream via standard stream utils copy methods. While this can be slower than direct input stream upload, it shouldn’t be this slow?

0reactions
msftbot[bot]commented, Jul 21, 2020

Hi, we’re sending this friendly reminder because we haven’t heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don’t hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Azure blob storage streaming performance issue
When the app is streaming the zip file on Azure blob storage it seems that the performance has decreased by at least 8-9...
Read more >
Do's and Don'ts for Streaming File Uploads to Azure Blob ...
Implement file uploads in the wrong way, and you may end up with memory leaks, server slowdowns, out-of-memory errors, and worst of all...
Read more >
Efficiently writing large objects to Azure Blob Storage – JSON ...
public void UploadDataModel(DataModel model) { MemoryStream stream = new MemoryStream(); using (var sw = new StreamWriter(stream)) using ...
Read more >
US20130311521A1 - Blob manipulation in an integrated structured ...
Binary Large Objects (Blobs) are a collection of bits stored in a data management ... deletion, renaming of streams, and hard-linking extents to...
Read more >
Milestone XProtect® VMS - Deployment Best Practice Guide
By contrast, the XProtect Recording Server service that records all the cameras and streaming video to clients is a comparatively high resource-demanding ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found