Layered Histogram fails to represent null values
See original GitHub issueIssue
The method for making layered histograms suggested on altair-viz.github.io seems to fail to take into account null values within the range of the data: bins that should be empty are represented as having 1 observation.
Examples
Altair:
import pandas as pd
import altair as alt
import numpy as np
np.random.seed(42)
# Generating Data
source = pd.DataFrame({
'Trial A': np.random.normal(0, 0.8, 1000),
'Trial B': np.random.normal(-2, 1, 1000),
'Trial C': np.random.normal(3, 2, 1000)
})
alt.Chart(source).transform_fold(
['Trial A', 'Trial B', 'Trial C'],
as_=['Experiment', 'Measurement']
).mark_area(
opacity=0.3,
interpolate='step'
).encode(
alt.X('Measurement:Q', bin=alt.Bin(maxbins=100)),
alt.Y('count()', stack=None),
alt.Color('Experiment:N')
)
Seaborn:
import seaborn as sns
source_melt = source.melt(var_name='Experiment', value_name='Measurement')
sns.displot(
source_melt,
x='Measurement',
binwidth=.2,
hue='Experiment',
aspect=2,
element='step'
)
Solutions
The easiest solution would be to suggest another method. The one below solves the null values in bins, but is much more verbose, and requires manually setting the x-axis and the legend.
trial_a = alt.Chart(source).mark_bar(
opacity=0.5, color='#c9d6e5'
).encode(
alt.X('Trial A:Q', bin=alt.Bin(maxbins=50), axis=None),
y='count()'
)
trial_b = alt.Chart(source).mark_bar(
opacity=0.5, color='#fcdab9'
).encode(
alt.X('Trial B:Q', bin=alt.Bin(maxbins=50), axis=None),
y='count()'
)
trial_c = alt.Chart(source).mark_bar(
opacity=0.5, color='#f7cccc'
).encode(
alt.X('Trial C:Q', bin=alt.Bin(maxbins=100), axis=None),
y='count()'
)
trial_a + trial_b + trial_c
Another would be to fix the issue. If this is a vega-lite issue, I can move it there, but thought I’d get an opinion here first.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
What is the standard for dealing with null values in plotting?
Imo, It depends on two major factors: What is the purpose of the plot; Does the null value indicate incompleteness (one of the...
Read more >What to do with missing values when plotting with seaborn?
You can use the following line to select the non-NaN values for a distribution plot using seaborn:
Read more >Handling Null and Other Special Values - Tableau Help
If the field contains null values, or if there are zeroes or negative values on a logarithmic axis, Tableau cannot plot them. When...
Read more >Chapter: Histograms - ROOT
Profile histograms, on the other hand, are used to display the mean value of Y and its RMS ... “ E0 ”: Draw...
Read more >Display empty cells, null (#N/A) values, and hidden worksheet ...
For #N/A values, you can choose to display those as an empty cell or connect data points with a line. The following examples...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@harabat Yup that is the right place. After you have made the changes you can build the documentation locally to make sure that it shows up as you indended with
make html
from thedoc
dir. To make the histogram look like an area without the whitelines between the bars, you can set thestroke
encoding to the same as thecolor
or maybe even better, setbinSpacing=0
insidemark_bar
. I have seen a couple of other issues about the step-interpolation not working exactly as a histogram, so I think you can make your suggested change to this example as well https://altair-viz.github.io/gallery/scatter_marginal_hist.htmlThis was resolved in #2419