question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

geom_density ignores "weight" argument

See original GitHub issue

Hi, I noticed that when running pn.ggplot(df, pn.aes(x="x, weight="w")) + pn.geom_density() the density is ignored. I am using plotnine version 0.6.0.

I validated the difference by running df.reindex(df.index.repeat(df["w"])) and plotting this without the weight argument.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
has2k1commented, Apr 29, 2020

I do not think you can do that, because for a kernel density algorithm there are two ways to affect the contribution of any distinct value towards the final density.

  1. It’s frequency (i.e. addition)
  2. It’s weight (i.e. multiplication, which is an shortcut of addition)

For stability of the algorithms the weighting (multiplication) is normalised to the [0, 1] domain for any given density computation. That shuts out option 2 leaving you with option 1.

So maybe you can make it easier by creating a helper function using something like

def weight_to_frequency(df, wt, precision=3):
    ns = np.round(((wt/sum(wt)) * (10**precision))).astype(int)  # no. times to replicate
    idx = np.repeat(df.index, ns)                     # selection indices
    df = df.loc[idx].reset_index(drop=True)     # replication
    return df

to come up with integer replication factors.

1reaction
pkhokhlovcommented, Apr 29, 2020

I encountered this issue as well. Please see the example below:

import pandas as pd
import plotnine as pn
import numpy as np
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=200,
                           n_features=1,
                           n_informative=1,
                           n_redundant=0,
                           n_clusters_per_class=1,
                           random_state=2)

df = pd.DataFrame({"x" : X.T[0], "y" : y})
df.y = df.y.astype("category")

df["wt"] = np.where(df["y"] == 1, 5, 1)

(pn.ggplot(df, pn.aes("x", fill="y")) +
            pn.geom_density(position="fill") +
            pn.theme_seaborn(style="whitegrid"))

Produces the following plot: stacked_density1

If we do:

(pn.ggplot(df, pn.aes("x", fill="y", weight="wt")) +
 pn.geom_density(position="fill") +
 pn.theme_seaborn(style="whitegrid"))

or

(pn.ggplot(df, pn.aes("x", fill="y")) +
 pn.geom_density(pn.aes(weight="wt"), position="fill") +
 pn.theme_seaborn(style="whitegrid"))

we get the same plot. However, if we do:

df2 = df.reindex(df.index.repeat(df["wt"]))

(pn.ggplot(df2, pn.aes("x", "stat(count)", fill="y")) +
 pn.geom_density(position="fill") +
 pn.theme_seaborn(style="whitegrid"))

We get: stacked_density2

Which is the expected result.

@has2k1 is there a way to produce the last plot above using weight or without repeating rows in a dataframe?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Weighted ggplot2 warning: Ignoring unknown aesthetics: weight
I try to plot a weighted density with ggplot2. The results seem to be fine, but I get the following warning: Warning: Ignoring...
Read more >
weighted density plots - Google Groups
I have observations that are weighted (all weights are between 0 and 1). I would like to use these weights as in the...
Read more >
Density plot — ggdensity • ggpubr - R Packages
Arguments ; a data frame · variable to be drawn. · one of "density" or "count". · logical value. Default is FALSE. Used...
Read more >
density ridges, plotting geom_vline using specific values for ...
It appears that the geom_density_ridges() geom cannot take weights in the calculation of the densities. However, it is quite common that...
Read more >
Package 'ggplot2'
geom_density created. A function will be called with a single argument, the plot data. The return value must be a data.frame, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found