question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Memory usage when upload file to DataLake gen2

See original GitHub issue

Query/Question I’m using Azure.Storage.Files.DataLake to send data every minute from my C# code.

public class DataLakeStorage : IFileStorage
{
	private readonly DataLakeServiceClient _dataLakeClient;

	public DataLakeStorage(FileStorageOptions options)
	{
		var sharedKeyCredentials = new StorageSharedKeyCredential(options.AccountName, options.AccountKey);
		_dataLakeClient = new DataLakeServiceClient(new Uri($"https://{options.AccountName}.dfs.core.windows.net"), sharedKeyCredentials);
	}

	public async Task Add(string containerName, string path, string name, object data)
	{
		DataLakeFileSystemClient fileSystemClient = _dataLakeClient.GetFileSystemClient(containerName.ToLower());
		await fileSystemClient.CreateIfNotExistsAsync();

		DataLakeDirectoryClient directoryClient = fileSystemClient.GetDirectoryClient(path);
		DataLakeFileClient fileClient = await directoryClient.CreateFileAsync(name);

		await using var stream = new MemoryStream();
                await JsonSerializer.SerializeAsync(stream, data, data.GetType());
                stream.Position = 0;
                await fileClient.AppendAsync(stream, 0);
                await fileClient.FlushAsync(stream.Length);
	}
}

DataLakeStorage is registered as a Singleton in web application. When I start using this code to send data every minute, I got a lot of memory usage of my application (each file ~150 bytes).

App without sending to datalake after 3 days uptime: ~170Mb App with sending to datalake after 6 hours uptime: ~900Mb

Where can be the issue? Could you please suggest what I’m doing wrong with sending data to datalake?

UPDATE Also, tried await fileClient.UploadAsync(stream, overwrite: true); instaed of

                await fileClient.AppendAsync(stream, 0);
                await fileClient.FlushAsync(stream.Length);

and added

  <PropertyGroup>
    <ServerGarbageCollection>false</ServerGarbageCollection>
  </PropertyGroup>

but the same result. It uses a lot of memory.

Environment: Package: Azure.Storage.Files.DataLake 12.4.0 App: netcoreapp3.1 OS: Linux in container

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
jsquirecommented, Mar 5, 2021

Hi @Marusyk. Unfortunately, I’m only serving as first triage in this case and have no insight into the status of this nor process that the owning team uses for triage. I’ll need to defer to @sumantmehtams and @xgithubtriage for assistance.

0reactions
Marusykcommented, Apr 18, 2021

Hi @sumantmehtams and @xgithubtriage any comments from your side?

Read more comments on GitHub >

github_iconTop Results From Across the Web

[Bug] Memory usage when upload file to DataLake gen2
[Bug] Memory usage when upload file to DataLake gen2 #2140 ... I'm using Azure.Storage.Files.DataLake to send data every minute from my C# ...
Read more >
Known issues with Azure Data Lake Storage Gen2
You can't use blob APIs, NFS 3.0, and Data Lake Storage APIs to write to the same instance of a file. If you...
Read more >
Uploading folders on Azure Datalake storage failed using ...
Uploading files using Azure Portal is easiest and reliable option. I'm not sure what exactly wrong you are doing assuming you have reliable ......
Read more >
How to Create Azure Data Lake Storage Gen2 & Copy Files ...
How to Create Azure Data Lake Storage Gen2 & Copy Files From Blob ... create Azure Data Lake Gen 2 Storage Upload files...
Read more >
Azure Data Lake Storage Gen2
On the File URL tab, enter URL for the file. About Azure storage accounts. When you use Tableau with Azure Data Lake Storage...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found