by() reduction doesn't combine with mean(), std() etc.
See original GitHub issueALL software version info
DS master in a fresh virtualenv.
Description of expected behavior and the observed behavior
by()
doesn’t seem to be able to combine itself with the more complex reductions like mean()
. I think it ignores _build_bases
when constructing its append
method, but I’m not adept enough with the code yet to fix it myself.
Complete, minimal, self-contained example code that reproduces the issue
import numpy as np
import dask.dataframe
import datashader
import pandas as pd
if __name__ == '__main__':
pf = pd.DataFrame(dict(a=np.arange(10), b=np.arange(10), c=np.arange(-5,5), cat=[0,0,0,1,1,1,2,2,2,3]))
ddf = dask.dataframe.from_pandas(pf, npartitions=1)
ddf = ddf.categorize('cat')
print(ddf)
canvas = datashader.Canvas(10, 10)
raster = canvas.points(ddf, 'a', 'b', datashader.count_cat('cat'))
print("count_cat ok")
raster = canvas.points(ddf, 'a', 'b', datashader.mean('c'))
print("mean ok")
raster = canvas.points(ddf, 'a', 'b', datashader.by('cat', datashader.mean('c')))
print("by(cat, mean(c)) ok")
Stack traceback and/or browser JavaScript console output
(sms) oms@tshikovski:~/projects/shadeMS$ python ./test-ds.py
Dask DataFrame Structure:
a b c cat
npartitions=1
0 int64 int64 int64 category[known]
9 ... ... ... ...
Dask Name: categorize_block, 2 tasks
count_cat ok
mean ok
Traceback (most recent call last):
File "/home/oms/.venv/sms/lib/python3.6/site-packages/toolz/functoolz.py", line 456, in memof
return cache[k]
KeyError: ((<datashader.reductions.by object at 0x7fcc800a9978>, dshape("""{
a: int64,
b: int64,
c: int64,
cat: categorical[[0, 1, 2, 3], type=int64, ordered=False]
}"""), <datashader.glyphs.points.Point object at 0x7fcc800a9dd8>), frozenset({('cuda', False)}))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./test-ds.py", line 27, in <module>
raster = canvas.points(ddf, 'a', 'b', datashader.by('cat', datashader.mean('c')))
File "/scratch/oms/projects/datashader/datashader/core.py", line 224, in points
return bypixel(source, self, glyph, agg)
File "/scratch/oms/projects/datashader/datashader/core.py", line 1192, in bypixel
return bypixel.pipeline(source, schema, canvas, glyph, agg)
File "/scratch/oms/projects/datashader/datashader/utils.py", line 94, in __call__
return lk[typ](head, *rest, **kwargs)
File "/scratch/oms/projects/datashader/datashader/data_libraries/dask.py", line 19, in dask_pipeline
dsk, name = glyph_dispatch(glyph, df, schema, canvas, summary, cuda=cuda)
File "/scratch/oms/projects/datashader/datashader/utils.py", line 97, in __call__
return lk[cls](head, *rest, **kwargs)
File "/scratch/oms/projects/datashader/datashader/data_libraries/dask.py", line 68, in default
compile_components(summary, schema, glyph, cuda=cuda)
File "/home/oms/.venv/sms/lib/python3.6/site-packages/toolz/functoolz.py", line 460, in memof
cache[k] = result = func(*args, **kwargs)
File "/scratch/oms/projects/datashader/datashader/compiler.py", line 57, in compile_components
calls = [_get_call_tuples(b, d, schema, cuda) for (b, d) in zip(bases, dshapes)]
File "/scratch/oms/projects/datashader/datashader/compiler.py", line 57, in <listcomp>
calls = [_get_call_tuples(b, d, schema, cuda) for (b, d) in zip(bases, dshapes)]
File "/scratch/oms/projects/datashader/datashader/compiler.py", line 83, in _get_call_tuples
return (base._build_append(dshape, schema, cuda),
File "/scratch/oms/projects/datashader/datashader/reductions.py", line 200, in _build_append
f = self.reduction._build_append(dshape, schema, cuda)
File "/scratch/oms/projects/datashader/datashader/reductions.py", line 115, in _build_append
return self._append
AttributeError: 'mean' object has no attribute '_append'
(sms) oms@tshikovski:~/projects/shadeMS$
Screenshots or screencasts of the bug in action
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (8 by maintainers)
Top Results From Across the Web
algorithm to combine std::unique with a reduce step?
Just like your original answer, this will not merge nonadjacent elements, but to do that you either have to sort them by index...
Read more >21 Iteration | R for Data Science - Hadley Wickham
Another tool for reducing duplication is iteration, which helps you when you ... A general way of creating an empty vector of given...
Read more >Standard deviation in Excel: functions and formula examples
The tutorial explains how to calculate standard deviation and standard error of the mean in Excel with formula examples.
Read more >Electrochemical Cell Conventions - Chemistry LibreTexts
Two electrodes: the anode where oxidation occurs and the cathode where reduction occurs (note that Cathode does not mean +, and Anode does...
Read more >Group by: split-apply-combine — pandas 1.5.2 documentation
On a DataFrame, we obtain a GroupBy object by calling groupby() . We could naturally group by either the A or B columns,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Nevermind, I can reproduce.
I’m afraid
by(..., std())
is still subtly broken. It’s producing incorrect numbers. See this example:The second-last raster above, the one made via a
by('cat', std('c'))
aggregation, consistently reports a max value of >10. The “c” column contains only random numbers from 0 to 9, so that value for std is impossible.The last raster, obtained by a plain
std('c')
reduction, looks to have sensible values.