question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] vbar_stack with DataFrame as source ignores entire row if first data column is NaN

See original GitHub issue

Software versions

OS: Linux 5.4.6 Browser: Chrome 78.0.3904.108 (Official Build) (64-bit) Python: 3.8.1 JupyterLab: 1.2.4 Pandas: 0.25.3 Bokeh: 1.4.0

Issue

When plotting a vbar_stack, the entire row in the data source is ignored if the first input column contains a NaN in a pandas.DataFrame.

If a standard Python dict is used as a data source, the output is plotted as expected.

from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource

data = dict(index=[1, 2, 3, 4],
            a=[4, 5, None, 7],
            b=[9, None, 7, 6])

source = ColumnDataSource(data)

f = figure()
f.vbar_stack(x="index", stackers=["a", "b"], source=source,
             width=0.5, color=["red", "blue"])
show(f)

source = ColumnDataSource(pd.DataFrame(index=[1, 2, 3, 4],
                                       data=[dict(a=4, b=9),
                                             dict(a=5, b=None),
                                             dict(a=None, b=7),
                                             dict(a=7, b=6),
                                            ]))

f = figure()
f.vbar_stack(x="index", stackers=["a", "b"], source=source,
             width=0.5, color=["red", "blue"])
show(f)

Expected behavior with dict as data source

Expected: dict as data source

Unexpected behavior with DataFrame as data source

Unexpected: DataFrame as data source

Mitigation

I currently let Pandas fill the NaNs with zeros.

Thanks for Bokeh, guys!

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:18 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
bryevdvcommented, Jan 9, 2020

@pohlt that case (“always in a data frame”) I think we can probably make consistent. Earlier I was really referring to the list vs data frame differences, which cannot always be bridged.

FWIW I think I’d start trying to “make things work” as in the vega case. Otherwise we are looking at:

  • scanning all the data in Python (potentially expensive), but
  • only in the case of stacked things (a bit complicated)
0reactions
xblahoudcommented, Dec 17, 2021

Thanks @bryevdv

I believe that at least raising an error would be a nice thing to do. For me, it took some time (and cross-verification with another plot) to find out that I miss half of the data in one column.

The mitigation can be advised in the error message.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Remove Row if NaN in First Five Columns - Stack Overflow
I have a pandas dataframe with dimensions 89 rows by 13 columns. I want to remove an entire row if NaN appears within...
Read more >
Working with missing data — pandas 1.5.2 documentation
In this section, we will discuss missing (also referred to as NA) values in pandas. Note. The choice of using NaN internally to...
Read more >
Handling Missing Data in Pandas: NaN Values Explained
You have a couple of alternatives to work with missing data. You can: Drop the whole row; Fill the row-column combination with some...
Read more >
Handling Missing Data | Python Data Science Handbook
Common special values like NaN are not available for all data types. ... The first sentinel value used by Pandas is None ,...
Read more >
Simplify your Dataset Cleaning with Pandas | by Ulysse Petit
I've heard a lot of analysts/data scientists saying they spend most of their time cleaning data. You've probably seen a lot of tutorials...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found