Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] vbar_stack with DataFrame as source ignores entire row if first data column is NaN

See original GitHub issue

Software versions

OS: Linux 5.4.6 Browser: Chrome 78.0.3904.108 (Official Build) (64-bit) Python: 3.8.1 JupyterLab: 1.2.4 Pandas: 0.25.3 Bokeh: 1.4.0

Issue

When plotting a vbar_stack, the entire row in the data source is ignored if the first input column contains a NaN in a pandas.DataFrame.

If a standard Python dict is used as a data source, the output is plotted as expected.

from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource

data = dict(index=[1, 2, 3, 4],
            a=[4, 5, None, 7],
            b=[9, None, 7, 6])

source = ColumnDataSource(data)

f = figure()
f.vbar_stack(x="index", stackers=["a", "b"], source=source,
             width=0.5, color=["red", "blue"])
show(f)

source = ColumnDataSource(pd.DataFrame(index=[1, 2, 3, 4],
                                       data=[dict(a=4, b=9),
                                             dict(a=5, b=None),
                                             dict(a=None, b=7),
                                             dict(a=7, b=6),
                                            ]))

f = figure()
f.vbar_stack(x="index", stackers=["a", "b"], source=source,
             width=0.5, color=["red", "blue"])
show(f)

Expected behavior with dict as data source

Expected: dict as data source

Unexpected behavior with DataFrame as data source

Unexpected: DataFrame as data source

Mitigation

I currently let Pandas fill the NaNs with zeros.

Thanks for Bokeh, guys!

Issue Analytics

State:
Created 4 years ago
Comments:18 (11 by maintainers)

Top GitHub Comments

1reaction

bryevdvcommented, Jan 9, 2020

@pohlt that case (“always in a data frame”) I think we can probably make consistent. Earlier I was really referring to the list vs data frame differences, which cannot always be bridged.

FWIW I think I’d start trying to “make things work” as in the vega case. Otherwise we are looking at:

scanning all the data in Python (potentially expensive), but
only in the case of stacked things (a bit complicated)

0reactions

xblahoudcommented, Dec 17, 2021

Thanks @bryevdv

I believe that at least raising an error would be a nice thing to do. For me, it took some time (and cross-verification with another plot) to find out that I miss half of the data in one column.

The mitigation can be advised in the error message.