`Model.predict()` generates unexpected out-of-sample predictions for a mixed effects model
See original GitHub issueHello,
first of all, thanks for the great work on this project, I’ve been using bambi a lot and it has been super helpful!
I’m currently facing a (potential) issue when trying to make out-of-sample predictions for a logistic regression model built with the following formula:
y ~ x1 + x2 + x3 + (0 + x2|x1) + (0 + x3|x1)
where x1
and x2
are categorical variables with two dimensions respectively and x3
is a continuous variable.
The out-of-sample data I’m trying to make predictions for looks like this (exemplary):
x1 | x2 | x3 |
---|---|---|
0 | 0 | 0 |
0 | 0 | 0.5 |
0 | 0 | 1 |
0 | 0 | 1.5 |
1 | 0 | 0 |
1 | 0 | 0.5 |
1 | 0 | 1 |
1 | 0 | 1.5 |
There was no error when running model.predict(iData, data=out_of_sample_data, kind='mean')
, however the spaghetti plot I generated from the posterior predictions looked off for when x1==1
, the variance was much bigger than I expected. (I noticed this because I had manually made a plot displaying the 0.5 decision boundary for x3
, i.e. the mean value and hdi intervals of x3
where there is a 50% probability of a positive outcome and that didn’t match what I saw in the spaghetti plot.)
I then had a look at the code and noticed that the Z
matrix generated in the predict
method in models.py
looked different from what I expected.
Here’s the code bit I’m referring to (last line):
if self._design.group:
if in_sample:
Z = self._design.group.design_matrix
else:
Z = self._design.group._evaluate_new_data(data).design_matrix
What I got for Z
was the following:
1 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0.5 | 0 |
1 | 0 | 0 | 0 | 1 | 0 |
1 | 0 | 0 | 0 | 1.5 | 0 |
0 | 0 | 1 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 0 | 0.5 |
0 | 0 | 1 | 0 | 0 | 1 |
0 | 0 | 1 | 0 | 0 | 1.5 |
…but what I was expecting (after trying to make sense of it) was this:
1 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0.5 | 0 |
1 | 0 | 0 | 0 | 1 | 0 |
1 | 0 | 0 | 0 | 1.5 | 0 |
0 | 1 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 | 0.5 |
0 | 1 | 0 | 0 | 0 | 1 |
0 | 1 | 0 | 0 | 0 | 1.5 |
so basically the second and third column swapped. I then added the following line:
Z[:, [1, 2]] = Z[:, [2, 1]]
to achieve that and the spaghetti plot I then generated matched my expectation.
It would be great if someone had a look at this and fixed it properly (if it really is an issue and not me making a mistake), I hope it was clear enough and if not, let me know!
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:19
Yes, it seems to be working now 😃
Yes, I also made a plot for a model with basically the same specification but instead of two categories in the group variable I had four and it also worked in that case 👍