question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Locks and chunked storage

See original GitHub issue

dask.array.store takes an optional argument, lock, which (to my understanding) avoids write contention by forcing workers to request access to the write target before writing. But for chunked storage like zarr arrays, write contention happens at the level of individual chunks, not the entire array. So perhaps a lock for chunked writes should have the granularity of the chunk structure of the storage target, thereby allowing the scheduler to control access to individual chunks. Does this make sense?

The context for this question is my attempt to optimize storing multiple enormous dask arrays in zarr containers with a minimum amount of rechunking, which makes the locking mechanism attractive, as long as the lock happens at the right place.

cc @jakirkham in case you have ideas about this.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:29 (22 by maintainers)

github_iconTop GitHub Comments

2reactions
d-v-bcommented, Jul 21, 2021
1reaction
gjoseph92commented, Jul 20, 2021

If it’s not too much trouble, I’d love to see a before/after performance report for 2021.07.0. Did it get worse both with and without the final rechunk?

It’s true that it’s a bit silly to move the data between workers just for the purpose of concatenating and storing it. Having workers synchronize writing the data they already have is simpler in many ways. I think we just all want this rechunking to perform better, so we’re eager to understand why it’s not working 😄

Read more comments on GitHub >

github_iconTop Results From Across the Web

Chunked Object Store · Netflix/astyanax Wiki - GitHub
Storing large objects in cassandra has to be done carefully since it can cause excessive heap pressure and hot spots.
Read more >
Lock free deduplication algorithm - How-to - Duplicacy Forum
The three elements of lock-free deduplication are: Use variable-size chunking algorithm to split files into chunks; Store each chunk in the ...
Read more >
Rechunker: The missing link for chunked array analytics
TLDR: this post describes a new python library called rechunker, which performs efficient on-disk rechunking of chunked array storage ...
Read more >
[SPIGOT-6950] Server hangs on shutdown, trying to write chunks ...
Server hangs on shutdown, trying to write chunks through locks ... [Spigot Watchdog Thread/ERROR]: Locked on:net.minecraft.world.level.chunk.storage.
Read more >
PutChunkedFile | Microsoft Learn
The PutChunkedFile operation updates the file's contents. PutChunkedFile supports two types of locks: WOPI lock and Coauth lock.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found