`SpillBuffer.spilled_total` appears to return incorrect results
See original GitHub issueWhen spilling data to disk, the SpillBuffer
appears to return the incorrect size of the data on disk. For example, when spilling an ~8 MB random matrix onto disk, the data
file created by the SpillBuffer
is also ~8 MB in size, yet the SpillBuffer only returns ~1 MB.
Reproducer:
from __future__ import annotations
import os
import tempfile
import numpy as np
from distributed.spill import SpillBuffer
def test_spill_size():
tmpdir = tempfile.mkdtemp()
buf = SpillBuffer(spill_directory=tmpdir, target=0, max_spill=False)
data = np.random.random((1024, 1024))
buf["data"] = data
spill_size = os.stat(os.path.join(tmpdir, "data")).st_size
assert buf.spilled_total.disk == spill_size
fails with
> assert buf.spilled_total.disk == spill_size
E assert 1048808 == 8388840
E + where 1048808 = SpilledSize(memory=8388608, disk=1048808).disk
E + where SpilledSize(memory=8388608, disk=1048808) = Buffer<<LRU: 0/0 on dict>, <zict.cache.Cache object at 0x11f2413d0>>.spilled_total
Issue Analytics
- State:
- Created a year ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
How to correct a #SPILL! error
This error occurs when the spill range for a spilled array formula isn't blank. ... When the formula is selected, a dashed border...
Read more >How to Fix the Excel Spill Error - 5 Easy Fixes
To fix this error, unmerge the merged cells or delete them. If you cannot visually locate them, click on the Select Obstructing cells...
Read more >#SPILL! error in Excel - what it means and how to fix
What does #SPILL mean in Excel? It's an error that occurs when a formula is unable to populate multiple cells with the calculated...
Read more >Spill to disk may cause data duplication · Issue #3756
In aggressive spill-to-disk scenarios I observed that distributed may spill all the data it has in memory while still complaining with the ...
Read more >Intermediate Data Spill in Mapreduce (Buffer Memory)
By looking at the container log below, it seems that the buffer size is almost 31055173 bytes not 100MB. And by dividing the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I was not aware that
len(memoryview)! = memoryview.nbytes
. Good catch! No need to wait for my return.That line was suggested by Guido when we were working on this. I’m sure he’ll have an idea of what could be happening here. I know he is on PTO, but he might be a good person to review this. He will be back next week.