[Blob Storage] Issues with input/output stream uploads, especially large streams.
See original GitHub issueQuery/Question
As just figured out in issue #5221 I am able to upload also large amounts of data from files to block blobs via the BlockBlobClient#uploadFromFile(filePath)
method.
However, I am usually in need of stream operations, with which I have issues. Example: a client uploads data to our backend via octet-stream. The Spring backend already maps this to a Java InputStream
which I have to channel directly to the cloud storage.
I tried the following:
BlockBlobClient#upload(inputStream, length)
: results in aRequestBodyTooLarge
error when file size is over the API limit of 256MB. How can I solve this? I would have expected the SDK to do the magic like with a file upload which can already be many gigabytes with one SDK upload method call.- I tried writing to the output stream of a blob directly via
BlockBlobClient#getBlobOutputStream()
, channeling an incomingInputStream
on the fly viaorg.springframework.util.StreamUtils#copy(inputStream, outputStream)
. This operation ** is very slow compared to file or input stream upload: with the latter I have speeds of 7Mb/s with my connection, the output stream write shows only ~ 140KiB/s, and ** results in completely wrong blob sizes on the storage: when I tried with a random 3Mb input stream, the blob size on the Azure portal is shown as just 4KiB. ** I am also not sure, even if it was faster and would produce a correct file size: would it work for large files?
Generally, I cannot buffer in memory or disk of the host machines due to resource constraints, this wouldn’t scale. Also, the data can be potentially large (several gigabytes): if I would buffer this in memory I’d have ‘out of memory’ exceptions for one client only quickly, and there may be a large number of parallel uploads by thousands of different clients. The same applies for disk space when using temp files to work around this, so it’s not an option.
Code snippets
public void test()
{
final Path file = Paths.get("~/Downloads/random-file.zip");
try(final RandomAccessFile raf = new RandomAccessFile("~/Downloads/random-file.zip", "rw"); final InputStream is = Files.newInputStream(file))
try(final InputStream is = Files.newInputStream(path))
{
final long largeFileSize = 1024 * 1024 * 1024 * 10L // 10Gb of size
final long smallFileSize = 1024 * 1024 * 3L // 3Mb of size
raf.setLength(largeFileSize);
//raf.setLength(smallFileSize); // -> use for trying out writing to output stream
cloudStorageClient.upload("random-file.zip", is, largeFileSize); // -> fails, I would expect to be able to use it with any content length
// writeToOutputStream(is); // -> this is very slow and produces false blob sizes.
}
catch (final IOException | CloudStorageException e)
{
LOG.error("Test failed", e);
}
finally
{
try
{
Files.deleteIfExists(file);
}
catch (final IOException e)
{
LOG.error("Test clean-up failed");
}
}
}
private void writeToOutputStream(final InputStream inputStream)
{
try(final OutputStream outputStream = containerClient.getBlockBlobClient("random-file.zip").getBlobOutputStream())
{
LOG.debug("Channeling content from input stream to blob storage...");
org.springframework.util.StreamUtils.copy(inputStream, outputStream);
}
catch (final IOException | StorageException e)
{
LOG.error("Test failed", e);
}
}
Why is this not a Bug or a feature Request? The input stream upload seems to adhere to the API limit of a 256Mb request body, so it’s technically not a bug. But I am not sure if the SDK shouldn’t handle this properly in its implementation, which is why I am not filing a feature request for now.
The issue with writing to the output stream may be a bug, but I am not sure yet.
Setup (please complete the following information if applicable):
- OS: Mac OS 10.14 (Mojave)
- IDE : IntelliJ
- 12.0.0-preview.2
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:14 (9 by maintainers)
Hi @rickle-msft I was now able to test writing to the output stream again with preview 4:
I basically copied a file
InputStream
to the blobOutputStream
via standard stream utils copy methods. While this can be slower than direct input stream upload, it shouldn’t be this slow?Hi, we’re sending this friendly reminder because we haven’t heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don’t hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!