question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Different binary produced when producing ZIP to Azure Blob Stream

See original GitHub issue

We found that our ZIP files were not being opened correctly on Macs, producing an error with ditto: ditto: Couldn't read pkzip signature.

After further investigation, it seems that ZIP files produced to a local file (which can be opened on a Mac) are different to that produced to a stream that is persisting to Azure Blob Store. However, I also tested simply opening a file stream, and using the blob stream to persist it, which resulted in the same binary. I think this indicates that something with the zipping process is causing the problem.

From a binary comparison, it seems the differences are in the start/end of the files, though I am note sure which parts are significant, or show what could be going wrong. On left is file persisted via Blob stream. Right is locally to disk. Start:
image
End:
image

You should be able to reproduce the issue with the following code. You can uncomment the various sections if you wish to write a local zip, or use a local file to upload to a blob store. Otherwise the uncommented code should produce a ZIP that would be invalid when opening on a Mac.

using System;
using System.Net.Http;
using System.Threading.Tasks;
using Azure.Storage.Blobs;
using Azure.Storage.Blobs.Specialized;
using ICSharpCode.SharpZipLib.Zip;

internal class CreateZipFile
{
	public static void Main(string[] args)
	{
		MainAsync(args).GetAwaiter().GetResult();
	}

	private static async Task MainAsync(string[] args)
	{
		var httpClient = new HttpClient();

		var _blobServiceClient = new BlobServiceClient("DefaultEndpointsProtocol=https;AccountName=<replace_with_account_name>;AccountKey=<replace_with_a_key>;EndpointSuffix=core.windows.net");
		var _containerClient = _blobServiceClient.GetBlobContainerClient("artifacts");
		var blobName = System.IO.Path.ChangeExtension(Guid.NewGuid().ToString(), "zip");
		var blockClient = _containerClient.GetBlockBlobClient(blobName);
		var artifactUploadStream = blockClient.OpenWrite(overwrite: true);

		try
		{
			string[] filenames = new[] {
				"https://peach.blender.org/wp-content/uploads/bbb-splash.png"
			};

			// Writing an already produced ZIP to blob store results in matching binary.

			//var localFileMs = new MemoryStream(await System.IO.File.ReadAllBytesAsync("C:\\DevTools\\Debug\\o-local.zip"));
			//localFileMs.Position = 0;
			//await localFileMs.CopyToAsync(artifactUploadStream);
			//await artifactUploadStream.FlushAsync();
			//await artifactUploadStream.DisposeAsync();
			//await localFileMs.DisposeAsync();
			//return;

			// Local file writing produces expected ZIP
			//using var zipStream = new ZipOutputStream(System.IO.File.Create("C:\\DevTools\\Debug\\o-local.zip"))
			using var zipStream = new ZipOutputStream(artifactUploadStream)
			{
				IsStreamOwner = true
			};

			var downloadStream = await httpClient.GetStreamAsync("https://peach.blender.org/wp-content/uploads/bbb-splash.png");

			var zipFileEntry = new ZipEntry("big-buck-bunny.png")
			{
				CompressionMethod = CompressionMethod.Deflated,
				DateTime = new DateTime(2021, 7, 13, 4, 28, 44),
			};

			zipStream.PutNextEntry(zipFileEntry);
			await downloadStream.CopyToAsync(zipStream);
			zipStream.CloseEntry();
			zipStream.Finish();
			zipStream.Close();
		}
		catch (Exception ex)
		{
			Console.WriteLine("Exception during processing {0}", ex);
		}
	}
}

Not sure where else I can investigate further, or if this is more an issue with the Blob Block Stream, or even a mix of both?

Edit: Here are the files to compare: remote.zip local.zip

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
pikselcommented, Apr 28, 2022

I did some more investigation on zip files using descriptor on macOS. I produced a zip file using InfoZip:

dd if=/dev/random bs=1M count=100 | zip | dd of=rand3.zip

I confirmed that it was actually using descriptors, and also noted that TestArchive considered the archive invalid: https://pub.p1k.se/sharpziplib/archivediag/rand3.zip.html

But that one could be read by ditto. So there is clearly some discrepancy between the files that SharpZipLib produces, and the ones that InfoZip produces…

1reaction
pikselcommented, Apr 8, 2022

ArchiveDiag can be found in the tools/archivediag branch.

I will do some more investigation to see if it’s possible to create .zip files that ditto (and Archive Utility) can read. I found the topic you linked as well, but no actual mention that Descriptors are not supported, so there might be something else that it’s more sensitive to.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Upload a blob with .NET - Azure Storage
You can upload data to a block blob from a file path, a stream, a binary object, or a text string. You can...
Read more >
Zipping files and uploading to Azure Blobstorage using ...
I am creating a website with ASP.NET Core, and I need to take a bunch of files, zip them together and then upload...
Read more >
How To Upload Files Into Azure Blob Storage Using ...
Azure Blob - also known as blobs which are mainly used to store the binary/text data such as photos, videos, documents, etc. Azure...
Read more >
Azure Storage Blobs client library for Python
Block blobs store text and binary data, up to approximately 4.75 TiB. Block blobs are made up of blocks of data that can...
Read more >
Triggering Azure Functions with Java when a storage change ...
The build concludes by creating a zip file containing the javadoc output, and storing it in an 'incoming' container within my Azure storage...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found