Unintuitive: order matters in specifying mixed effects?
See original GitHub issueHi - I’ve recently been using bambi 0.6.3
with pymc3 3.11.2
and python 3.8. I spent a while struggling to specify mixed effects - interactions like I’d do in patsy (*
or :
) worked fine, but |
gave me a slope+intercept per observation and OOMed on real data. After a fair bit of unsuccessful effort - I thought bambi thought my continuous variable was a category or that my category was continuous - realized I need to do feature|group
, not group|feature
. It’s very possible this is standard practice in other bayesian linear model packages, but I find this very unintuitive and suspect others might feel similarly? In case this is a bug, here’s code to reproduce on a toy dataset:
# debug on toy data
from sklearn.datasets import make_regression
xs, ys = make_regression(
n_samples=500,
n_features=5,
n_informative=5,
n_targets=1,
bias=0.0,
effective_rank=None,
tail_strength=0.5,
noise=5e1,
shuffle=True,
coef=False,
random_state=42,
)
feat_names = {i: f"f{i}" for i in range(5)}
xys = pd.concat(
[pd.DataFrame(xs).rename(columns=feat_names), pd.Series(ys).rename("y")], axis=1
)
# add hierarchy
xys["group_i"] = xys.index // 50
xys["group"] = "g_" + xys.group_i.astype("str")
# add group intercept
xys["y"] = xys.y + 30 * xys.group_i
# add group slope
xys["y"] = xys.y + 1e1 * xys.f0 * xys.group_i
# scale y to unit normal
xys["y"] = (xys.y - xys.y.mean()) / xys.y.std()
# expected behavior: mixed effects
debug_model = bmb.Model(
"y ~ f0 + f1 + f2 + f3 + f4 + (f0|group_i)",
data=xys,
)
debug_model
debug_model.build()
debug_model.graph()
# unintuitive behavior: one group per observation!
debug_model = bmb.Model(
"y ~ f0 + f1 + f2 + f3 + f4 + (group_i|f0)",
data=xys,
)
debug_model
debug_model.build()
debug_model.graph()
expected:
unexpected:
Appreciate your help and am a fan of the package 😃 rapid progress since I last used it this ~spring. Looking forward to splines!
Issue Analytics
- State:
- Created 2 years ago
- Comments:5
Yes, that would help - just trying to make it so the next person doesn’t end up very confused why they have one group per observation. Feel free to close.
Maybe this can help? https://bambinos.github.io/formulae/notebooks/getting_started.html#Group-specific-effects