[BUG] vbar_stack with DataFrame as source ignores entire row if first data column is NaN
See original GitHub issueSoftware versions
OS: Linux 5.4.6 Browser: Chrome 78.0.3904.108 (Official Build) (64-bit) Python: 3.8.1 JupyterLab: 1.2.4 Pandas: 0.25.3 Bokeh: 1.4.0
Issue
When plotting a vbar_stack
, the entire row in the data source is ignored if the first input column contains a NaN in a pandas.DataFrame
.
If a standard Python dict
is used as a data source, the output is plotted as expected.
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource
data = dict(index=[1, 2, 3, 4],
a=[4, 5, None, 7],
b=[9, None, 7, 6])
source = ColumnDataSource(data)
f = figure()
f.vbar_stack(x="index", stackers=["a", "b"], source=source,
width=0.5, color=["red", "blue"])
show(f)
source = ColumnDataSource(pd.DataFrame(index=[1, 2, 3, 4],
data=[dict(a=4, b=9),
dict(a=5, b=None),
dict(a=None, b=7),
dict(a=7, b=6),
]))
f = figure()
f.vbar_stack(x="index", stackers=["a", "b"], source=source,
width=0.5, color=["red", "blue"])
show(f)
Expected behavior with dict as data source
Unexpected behavior with DataFrame as data source
Mitigation
I currently let Pandas fill the NaN
s with zeros.
Thanks for Bokeh, guys!
Issue Analytics
- State:
- Created 4 years ago
- Comments:18 (11 by maintainers)
Top Results From Across the Web
Remove Row if NaN in First Five Columns - Stack Overflow
I have a pandas dataframe with dimensions 89 rows by 13 columns. I want to remove an entire row if NaN appears within...
Read more >Working with missing data — pandas 1.5.2 documentation
In this section, we will discuss missing (also referred to as NA) values in pandas. Note. The choice of using NaN internally to...
Read more >Handling Missing Data in Pandas: NaN Values Explained
You have a couple of alternatives to work with missing data. You can: Drop the whole row; Fill the row-column combination with some...
Read more >Handling Missing Data | Python Data Science Handbook
Common special values like NaN are not available for all data types. ... The first sentinel value used by Pandas is None ,...
Read more >Simplify your Dataset Cleaning with Pandas | by Ulysse Petit
I've heard a lot of analysts/data scientists saying they spend most of their time cleaning data. You've probably seen a lot of tutorials...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@pohlt that case (“always in a data frame”) I think we can probably make consistent. Earlier I was really referring to the list vs data frame differences, which cannot always be bridged.
FWIW I think I’d start trying to “make things work” as in the vega case. Otherwise we are looking at:
Thanks @bryevdv
I believe that at least raising an error would be a nice thing to do. For me, it took some time (and cross-verification with another plot) to find out that I miss half of the data in one column.
The mitigation can be advised in the error message.