Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Design how we're going to extend Bambi

See original GitHub issue

The following is a list of features we’re missing (or covering only partially) in Bambi

Distributional models (we model more than the mean parameter of the response)
Multivariate models (ie the response is a multivariate distribution)
Non-linear models.
Survival models/Models with censored data.
Ordinal models.
Zero and Zero-One inflated models.

The last three points (survival/censored, ordinal, and zero/zero-one inflated) are covered by the first points (distributional and multivariate) if we implement them appropriately. The third point, non-linear models, is a separate problem. I’ll try to add a couple of things I’ve been thinking about lately.

Distributional models

Some API proposals

formula = bmb.formula(
    "y ~ a + b",
    "sigma ~ a",
)
priors = {
    "a": bmb.Prior("Normal", mu=0, sigma=1),
    "b": bmb.Prior("Normal", mu=0, sigma=1),
    "sigma_Intercept": bmb.Prior("Normal", mu=0, sigma=1),
    "sigma_x": bmb.Prior("Normal", mu=0, sigma=1)
}
link = {"mu": "identity", "sigma": "log"}
model = bmb.Model(formula, data, priors, link)

We need a formula object where we can have multiple formula parts. I propose to call it bmb.formula(). There’s an open discussion in #423.
We need a name for the terms associated with the auxiliary parameters. I propose to use {param}_{term} such as sigma_x.
We need a transformation of the linear predictor of the auxiliary parameters into something that makes sense. I propose we have defaults for the built-in families that can be overridden with a dictionary. Note a dictionary is not supported by the link argument in Model now.

I haven’t thought much more about the implementation details, where other concerns may appear. For the moment, I think it’s good to discuss about the API we want. Any objections, any suggestions, any drawbacks I’m not seeing?

Multivariate models

We currently support some multivariate families, such as "categorical" and "multinomial". I feel we should think more about the implementation. I think we could make it more general so we don’t need to handle all cases as special cases. With that said, I think there are other things to discuss.

What do we use to indicate a multivariate response?

"c(y1, y2, ..., yn) ~ ..."
"mvbind(y1, y2, ..., yn) ~ ..."
bmb.formula("y1 ~ ...", "y2 ~ ...", "y3 ~ ...")

note the last alternative allows for different predictors to be included in each case.

How much do we want to support multivariate families?

I’m not an expert in this area but I have the feeling that things can get very complex very quickly. And I’m not sure if this is a highly required feature.

For now, I tend to think we should have minimum support that allows people and us to explore the possibilities available as well as refine the API.

Non-linear models

This has been discussed a little here #448. I think it’s a very nice to have feature but I don’t have it solved in my mind yet. The only thing I have are some API proposals, but I don’t see how to implement them without a huge effort.

First:

formula = bmb.formula(
    "y ~ b1 * np.exp(b2 * x)",
    nlpars=("b1", "b2")    
)

But this comes with a major problem, how do we override the meaning of the * operator in the formula syntax? If we pass something like that to formulae, it won’t multiply things by b1 or b2, it will try to construct full interaction terms between the operands. I like how this approach looks but it would require a huge amount of effort to parse terms and parameters.

Another alternative would be to use a function.

def f(x, b1, b2):
    return b1 * np.exp(b2 * x)

formula = bmb.formula(
    "y ~ f(x, b1, b2)",
    nlpars=("b1", "b2")    
)

This would work on the formulae side, but again we would need to do parsing stuff to grab the non-linear relationship between the parameters (b1 and b2) and the predictor x. How do we handle arbitrarily complex functions? I’m not sure.

Survival models/Models with censored data.

#543 adds support for survival analysis with right-censored data. One drawback of the proposal is that family="exponential" always implies right-censored data. I think we should have something more general.

I imagine all the following cases working

bmb.Model("y ~ ...", data, family="exponential")
bmb.Model("censored(y, status) ~ ...", data, family="exponential")
bmb.Model("censored(y, status, 'left') ~ ...", data, family="exponential")

The challenge is that censored() should be a function that returns an array-like structure (so formulae knows how to handle it) with some attribute that enables Bambi to figure out the characteristics of the censoring. I’m not sure how to implement this but I know it’s feasible.

Ordinal models and Zero and Zero-One inflated models.

I think these ones come almost for free if we do a good job with the tasks above.

Issue Analytics

State:
Created a year ago
Comments:9

Top GitHub Comments

2reactions

canyon289commented, Jul 15, 2022

Other than the technical implementation frankly I dont think itll be all that usefull and theres not a huge userbase for it. If people want non linear models they can just use PyMC to code those up.

The other use cases imo are much easier to implement in Bambi and will have a wider userbase.

1reaction

zwelitunyiswacommented, Oct 22, 2022

Nice. I like that structure - very clear. Distributional models are a very cool addition!

On Sat, Oct 22, 2022 at 15:08 Tomás Capretto @.***> wrote:

I have new ideas for distributional models

Instead of this

formula = bmb.formula( “y ~ a + b”, “sigma ~ a”, )priors = { “a”: bmb.Prior(“Normal”, mu=0, sigma=1), “b”: bmb.Prior(“Normal”, mu=0, sigma=1), “sigma_Intercept”: bmb.Prior(“Normal”, mu=0, sigma=1), “sigma_x”: bmb.Prior(“Normal”, mu=0, sigma=1) }link = {“mu”: “identity”, “sigma”: “log”}model = bmb.Model(formula, data, priors, link)

have this (notice the priors)

formula = bmb.formula( “y ~ a + b”, “sigma ~ a”, )priors = { “y”: { “a”: bmb.Prior(“Normal”, mu=0, sigma=1), “b”: bmb.Prior(“Normal”, mu=0, sigma=1), }, “sigma”: { “Intercept”: bmb.Prior(“Normal”, mu=0, sigma=1), “a”: bmb.Prior(“Normal”, mu=0, sigma=1) } }link = {“mu”: “identity”, “sigma”: “log”}model = bmb.Model(formula, data, priors, link)

It adds more structure and prevents us from having to parse strings to decide to which response the prior corresponds to. Also, “_” is very common in variable names, so it’s highly likely we get it wrong.

— Reply to this email directly, view it on GitHub https://github.com/bambinos/bambi/issues/544#issuecomment-1287886498, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH3QQVY2LAPXLAPSQZY5YBLWEQ3SZANCNFSM53HNB7OQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>