Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`Model.predict()` generates unexpected out-of-sample predictions for a mixed effects model

See original GitHub issue

Hello,

first of all, thanks for the great work on this project, I’ve been using bambi a lot and it has been super helpful!

I’m currently facing a (potential) issue when trying to make out-of-sample predictions for a logistic regression model built with the following formula:

y ~ x1 + x2 + x3 + (0 + x2|x1) + (0 + x3|x1)

where x1 and x2 are categorical variables with two dimensions respectively and x3 is a continuous variable.

The out-of-sample data I’m trying to make predictions for looks like this (exemplary):

x1	x2	x3
0	0	0
0	0	0.5
0	0	1
0	0	1.5
1	0	0
1	0	0.5
1	0	1
1	0	1.5

There was no error when running model.predict(iData, data=out_of_sample_data, kind='mean'), however the spaghetti plot I generated from the posterior predictions looked off for when x1==1, the variance was much bigger than I expected. (I noticed this because I had manually made a plot displaying the 0.5 decision boundary for x3, i.e. the mean value and hdi intervals of x3 where there is a 50% probability of a positive outcome and that didn’t match what I saw in the spaghetti plot.)

I then had a look at the code and noticed that the Z matrix generated in the predict method in models.py looked different from what I expected. Here’s the code bit I’m referring to (last line):

        if self._design.group:
            if in_sample:
                Z = self._design.group.design_matrix
            else:
                Z = self._design.group._evaluate_new_data(data).design_matrix

What I got for Z was the following:


1	0	0	0
1	0	0.5	0
1	0	1	0
1	0	1.5	0
0	1	0	0
0	1	0	0.5
0	1	0	1
0	1	0	1.5

…but what I was expecting (after trying to make sense of it) was this:


1	0	0	0
1	0	0.5	0
1	0	1	0
1	0	1.5	0
0	1	0	0
0	1	0	0.5
0	1	0	1
0	1	0	1.5

so basically the second and third column swapped. I then added the following line:

Z[:, [1, 2]] = Z[:, [2, 1]]

to achieve that and the spaghetti plot I then generated matched my expectation.

It would be great if someone had a look at this and fixed it properly (if it really is an issue and not me making a mistake), I hope it was clear enough and if not, let me know!

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:19

Top GitHub Comments

1reaction

LeonieMeicommented, Apr 26, 2022

Yes, it seems to be working now 😃

1reaction

LeonieMeicommented, Apr 4, 2022

Yes, I also made a plot for a model with basically the same specification but instead of two categories in the group variable I had four and it also worked in that case 👍

Top Results From Across the Web

predict() Function for lmer Mixed Effects Models

How does the predict function operate in this lmer model? Evidently it's taking into consideration the Time variable, resulting in a much ...

Out-of-sample predictions for mixed model are the same as ...

I have a dataset that consists of subjects coming into the clinic (for treatment of another disease) and they are screened for Tuberclosis ......

Chapter 9 Linear mixed-effects models | An R companion to ...

Chapter 9 Linear mixed-effects models. In this Chapter, we will look at how to estimate and perform hypothesis tests for linear mixed-effects models....

Chapter 3 Fitting Linear Mixed Models

3.3 Predicting the random effects. While keeping in mind that we do not estimate the random intercepts when fitting this model, we can...

An Introduction to Linear Mixed-Effects Modeling in R

This Tutorial serves as both an approachable theoretical introduction to mixed-effects modeling and a practical introduction to how to implement ...