question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Layered Histogram fails to represent null values

See original GitHub issue

Issue

The method for making layered histograms suggested on altair-viz.github.io seems to fail to take into account null values within the range of the data: bins that should be empty are represented as having 1 observation.


Examples

Altair:

import pandas as pd
import altair as alt
import numpy as np

np.random.seed(42)

# Generating Data
source = pd.DataFrame({
    'Trial A': np.random.normal(0, 0.8, 1000),
    'Trial B': np.random.normal(-2, 1, 1000),
    'Trial C': np.random.normal(3, 2, 1000)
})

alt.Chart(source).transform_fold(
    ['Trial A', 'Trial B', 'Trial C'],
    as_=['Experiment', 'Measurement']
).mark_area(
    opacity=0.3,
    interpolate='step'
).encode(
    alt.X('Measurement:Q', bin=alt.Bin(maxbins=100)),
    alt.Y('count()', stack=None),
    alt.Color('Experiment:N')
)

layered_histogram_altair

Seaborn:

import seaborn as sns

source_melt = source.melt(var_name='Experiment', value_name='Measurement')
sns.displot(
    source_melt,
    x='Measurement',
    binwidth=.2,
    hue='Experiment',
    aspect=2,
    element='step'
)

layered_histogram_seaborn


Solutions

The easiest solution would be to suggest another method. The one below solves the null values in bins, but is much more verbose, and requires manually setting the x-axis and the legend.

trial_a = alt.Chart(source).mark_bar(
    opacity=0.5, color='#c9d6e5'
).encode(
    alt.X('Trial A:Q', bin=alt.Bin(maxbins=50), axis=None),
    y='count()'
)

trial_b = alt.Chart(source).mark_bar(
    opacity=0.5, color='#fcdab9'
).encode(
    alt.X('Trial B:Q', bin=alt.Bin(maxbins=50), axis=None),
    y='count()'
)

trial_c = alt.Chart(source).mark_bar(
    opacity=0.5, color='#f7cccc'
).encode(
    alt.X('Trial C:Q', bin=alt.Bin(maxbins=100), axis=None),
    y='count()'
)

trial_a + trial_b + trial_c

layered_histogram_altair_solution

Another would be to fix the issue. If this is a vega-lite issue, I can move it there, but thought I’d get an opinion here first.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
joelostblomcommented, Feb 23, 2021

@harabat Yup that is the right place. After you have made the changes you can build the documentation locally to make sure that it shows up as you indended with make html from the doc dir. To make the histogram look like an area without the whitelines between the bars, you can set the stroke encoding to the same as the color or maybe even better, set binSpacing=0 inside mark_bar. I have seen a couple of other issues about the step-interpolation not working exactly as a histogram, so I think you can make your suggested change to this example as well https://altair-viz.github.io/gallery/scatter_marginal_hist.html

0reactions
joelostblomcommented, Mar 23, 2022

This was resolved in #2419

Read more comments on GitHub >

github_iconTop Results From Across the Web

What is the standard for dealing with null values in plotting?
Imo, It depends on two major factors: What is the purpose of the plot; Does the null value indicate incompleteness (one of the...
Read more >
What to do with missing values when plotting with seaborn?
You can use the following line to select the non-NaN values for a distribution plot using seaborn:
Read more >
Handling Null and Other Special Values - Tableau Help
If the field contains null values, or if there are zeroes or negative values on a logarithmic axis, Tableau cannot plot them. When...
Read more >
Chapter: Histograms - ROOT
Profile histograms, on the other hand, are used to display the mean value of Y and its RMS ... “ E0 ”: Draw...
Read more >
Display empty cells, null (#N/A) values, and hidden worksheet ...
For #N/A values, you can choose to display those as an empty cell or connect data points with a line. The following examples...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found