Different binary produced when producing ZIP to Azure Blob Stream
See original GitHub issueWe found that our ZIP files were not being opened correctly on Macs, producing an error with ditto
:
ditto: Couldn't read pkzip signature.
After further investigation, it seems that ZIP files produced to a local file (which can be opened on a Mac) are different to that produced to a stream that is persisting to Azure Blob Store. However, I also tested simply opening a file stream, and using the blob stream to persist it, which resulted in the same binary. I think this indicates that something with the zipping process is causing the problem.
From a binary comparison, it seems the differences are in the start/end of the files, though I am note sure which parts are significant, or show what could be going wrong.
On left is file persisted via Blob stream. Right is locally to disk.
Start:
End:
You should be able to reproduce the issue with the following code. You can uncomment the various sections if you wish to write a local zip, or use a local file to upload to a blob store. Otherwise the uncommented code should produce a ZIP that would be invalid when opening on a Mac.
using System;
using System.Net.Http;
using System.Threading.Tasks;
using Azure.Storage.Blobs;
using Azure.Storage.Blobs.Specialized;
using ICSharpCode.SharpZipLib.Zip;
internal class CreateZipFile
{
public static void Main(string[] args)
{
MainAsync(args).GetAwaiter().GetResult();
}
private static async Task MainAsync(string[] args)
{
var httpClient = new HttpClient();
var _blobServiceClient = new BlobServiceClient("DefaultEndpointsProtocol=https;AccountName=<replace_with_account_name>;AccountKey=<replace_with_a_key>;EndpointSuffix=core.windows.net");
var _containerClient = _blobServiceClient.GetBlobContainerClient("artifacts");
var blobName = System.IO.Path.ChangeExtension(Guid.NewGuid().ToString(), "zip");
var blockClient = _containerClient.GetBlockBlobClient(blobName);
var artifactUploadStream = blockClient.OpenWrite(overwrite: true);
try
{
string[] filenames = new[] {
"https://peach.blender.org/wp-content/uploads/bbb-splash.png"
};
// Writing an already produced ZIP to blob store results in matching binary.
//var localFileMs = new MemoryStream(await System.IO.File.ReadAllBytesAsync("C:\\DevTools\\Debug\\o-local.zip"));
//localFileMs.Position = 0;
//await localFileMs.CopyToAsync(artifactUploadStream);
//await artifactUploadStream.FlushAsync();
//await artifactUploadStream.DisposeAsync();
//await localFileMs.DisposeAsync();
//return;
// Local file writing produces expected ZIP
//using var zipStream = new ZipOutputStream(System.IO.File.Create("C:\\DevTools\\Debug\\o-local.zip"))
using var zipStream = new ZipOutputStream(artifactUploadStream)
{
IsStreamOwner = true
};
var downloadStream = await httpClient.GetStreamAsync("https://peach.blender.org/wp-content/uploads/bbb-splash.png");
var zipFileEntry = new ZipEntry("big-buck-bunny.png")
{
CompressionMethod = CompressionMethod.Deflated,
DateTime = new DateTime(2021, 7, 13, 4, 28, 44),
};
zipStream.PutNextEntry(zipFileEntry);
await downloadStream.CopyToAsync(zipStream);
zipStream.CloseEntry();
zipStream.Finish();
zipStream.Close();
}
catch (Exception ex)
{
Console.WriteLine("Exception during processing {0}", ex);
}
}
}
Not sure where else I can investigate further, or if this is more an issue with the Blob Block Stream, or even a mix of both?
Edit: Here are the files to compare: remote.zip local.zip
Issue Analytics
- State:
- Created a year ago
- Comments:11 (6 by maintainers)
I did some more investigation on zip files using descriptor on macOS. I produced a zip file using InfoZip:
I confirmed that it was actually using descriptors, and also noted that
TestArchive
considered the archive invalid: https://pub.p1k.se/sharpziplib/archivediag/rand3.zip.htmlBut that one could be read by ditto. So there is clearly some discrepancy between the files that SharpZipLib produces, and the ones that InfoZip produces…
ArchiveDiag can be found in the tools/archivediag branch.
I will do some more investigation to see if it’s possible to create .zip files that ditto (and
Archive Utility
) can read. I found the topic you linked as well, but no actual mention thatDescriptor
s are not supported, so there might be something else that it’s more sensitive to.