Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to implement predictions

See original GitHub issue

I would like to implement a Model.predict() method. My original idea was to replicate predict() in R, where we pass a data frame and obtain predictions for the mean response. However, @aloctavodia helped me to realize we should consider both predictions for the mean response and predictions for the response variable (like what is being done with Model.posterior_predictive() but also available for out-of-sample data).

So, what I have in mind is something as follows

predictions = model.predict(idata, kind, xdata=None)

where kind = "mean" returns predictions in terms of the mean response (i.e. g^-1(x’b)) and kind="response" returns the posterior predictive distribution. If xdata is a data frame, predictions are made using the new data set (out of sample predictions). If not, predictions are made using the original sample (in sample predictions).

What I would like to discuss

We already have a Model.posterior_predictive() method that samples the posterior predictive distribution, using the observed values for the predictors. There’s overlap between this method and the one I want to implement. Should we keep both? Only one? I don’t want to have two methods doing something similar (or the same thing), but we have to be aware dropping Model.posterior_predictive() is another API change and can break existing code. Maybe a deprecation warning?
Argument names and default behavior.
Anything you would like to add to this discussion.

Linking #356, where this feature is requested.

Issue Analytics

State:
Created 2 years ago
Comments:15

Top GitHub Comments

2reactions

tomicaprettocommented, Jun 12, 2021

Hi @ioannis12

This is an example of how you could do it for mixed-effects models. I used simulated data so it may differ from yours.

import bambi as bmb
import numpy as np
import pandas as pd

from scipy.special import expit

My simulated data

f_delta = pd.DataFrame({
    "A": np.random.choice([0, 1], size=100),
    "B": np.random.normal(size=100),
    "C": np.random.normal(size=100),
    "D": np.random.normal(size=100),
    "E": ["Group 1"] * 25 + ["Group 2"] * 25 + ["Group 3"] * 25 + ["Group 4"] * 25
})

Same model

model = bmb.Model('A ~ 0 + B + C + D + (1|E)', f_delta, family = 'bernoulli')
idata = model.fit()

I just grab some samples from my original data to act as a test data, here you will have your test data.

# Select some random rows from the original data
rows = np.random.choice(np.arange(data.shape[0]), size = 20)
f_delta_2 = f_delta.iloc[rows, :]

Take the posterior from the inference data result, and stack chains and draws.

posterior = idata.posterior.stack(sample=["chain", "draw"])

And now, perhaps, the most tricky part. In mixed-effects models, you have something like

$y = X\beta + Zu$, so we need to construct both X and Z, for the common and group-specific parts respectively. Here we use a method from the internal design object in the model. This is what we want to avoid people from doing with a higher-level implementation of predictions. But since it’s not available yet, here we go.

X = model._design.common._evaluate_new_data(f_delta_2).design_matrix
Z = model._design.group._evaluate_new_data(f_delta_2).design_matrix.toarray()

The coefficients beta and u are taken from the posterior.

# Coefficients for common and group specific parts
β = np.vstack([np.atleast_2d(posterior[name]) for name in model.common_terms])
# I just grab the only group-specific term in the model. Otherwise you could use a comprehension as above,
# but using `model.group_specific_terms`.
u = np.vstack([np.atleast_2d(posterior["1|E"])])

And finally, compute the linear predictor and pass it to the expit() function (to convert from log-odds to probability scale). You can summarize the posterior for each individual with the mean, or any other function you want. Or you can just keep the posterior!

predictions = expit(np.dot(X, β) + np.dot(Z, u))
predictions.mean(1)

array([0.72031907, 0.42519583, 0.56074857, 0.4676208 , 0.43403285,
       0.4408966 , 0.49711422, 0.49851915, 0.41220761, 0.57956251,
       0.59578587, 0.63213104, 0.53078604, 0.68282899, 0.43710477,
       0.49711422, 0.52201541, 0.35639774, 0.64629308, 0.35083164])

Edit: This is something anyone can do with their models if in need of predictions right now. In the nearby future, this will be replaced with a method call such as Model.predict().

1reaction

tomicaprettocommented, Jul 13, 2021

@ioannis12 @gregorystrubel the PR #372 implements predictions for Bambi. Feel free to leave your suggestions if you want. Thanks!

Top Results From Across the Web

How to use a model to do predictions with Keras - ActiveState

Click to learn what goes into making a Keras model and using it to detect trends and make predictions. Understand the most common...

How to Implement the Sklearn Predict Approach? - R-bloggers

Run setup code; satisfy the model; project future values; Execute setup code. Before fitting initializes, fitting, or predicting, some setup ...

Prediction Engineering: How to Set Up Your Machine ...

Identify a business need that can be solved with available data; Translate the business need into a supervised machine learning problem; Create ......

Implement prediction models - Creatio Academy

Implement prediction models ... Once your prediction model is created, set up the actual data prediction in a business process using the [...

How to Use Correlation to Make Predictions

In this case, the correlation is useful — since it is helping to predict who will be productive, even if it says nothing...