question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`SpillBuffer.spilled_total` appears to return incorrect results

See original GitHub issue

When spilling data to disk, the SpillBuffer appears to return the incorrect size of the data on disk. For example, when spilling an ~8 MB random matrix onto disk, the data file created by the SpillBuffer is also ~8 MB in size, yet the SpillBuffer only returns ~1 MB.

Reproducer:

from __future__ import annotations

import os
import tempfile

import numpy as np

from distributed.spill import SpillBuffer


def test_spill_size():
    tmpdir = tempfile.mkdtemp()
    buf = SpillBuffer(spill_directory=tmpdir, target=0, max_spill=False)
    data = np.random.random((1024, 1024))
    buf["data"] = data
    spill_size = os.stat(os.path.join(tmpdir,  "data")).st_size
    assert buf.spilled_total.disk == spill_size

fails with

>       assert buf.spilled_total.disk == spill_size
E       assert 1048808 == 8388840
E        +  where 1048808 = SpilledSize(memory=8388608, disk=1048808).disk
E        +    where SpilledSize(memory=8388608, disk=1048808) = Buffer<<LRU: 0/0 on dict>, <zict.cache.Cache object at 0x11f2413d0>>.spilled_total

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
crusaderkycommented, Jul 26, 2022

I was not aware that len(memoryview)! = memoryview.nbytes. Good catch! No need to wait for my return.

0reactions
ncclementicommented, Jul 25, 2022

That line was suggested by Guido when we were working on this. I’m sure he’ll have an idea of what could be happening here. I know he is on PTO, but he might be a good person to review this. He will be back next week.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to correct a #SPILL! error
This error occurs when the spill range for a spilled array formula isn't blank. ... When the formula is selected, a dashed border...
Read more >
How to Fix the Excel Spill Error - 5 Easy Fixes
To fix this error, unmerge the merged cells or delete them. If you cannot visually locate them, click on the Select Obstructing cells...
Read more >
#SPILL! error in Excel - what it means and how to fix
What does #SPILL mean in Excel? It's an error that occurs when a formula is unable to populate multiple cells with the calculated...
Read more >
Spill to disk may cause data duplication · Issue #3756
In aggressive spill-to-disk scenarios I observed that distributed may spill all the data it has in memory while still complaining with the ...
Read more >
Intermediate Data Spill in Mapreduce (Buffer Memory)
By looking at the container log below, it seems that the buffer size is almost 31055173 bytes not 100MB. And by dividing the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found