Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question: Massage InferenceData for multi-variable model comparison?

See original GitHub issue

Short Description

I have a PyMC3 model that is partitioned into 6 sub-models. The observations are also partitioned into 6 subsets. This lets me apply different parameters based on the values of independent variables without complex indexing. I can’t use mixing because there is no distribution over the independent variables.

I know model comparison only works for models with a single observed RV. But I was wondering: is there some way to “unpartition” the InferenceData so that Arviz can treat the six vectors of observations as one big vector, and compute model comparison metrics?

I can compare the submodels individually, but this does not properly take into account the hyperparameters that link them.

Issue Analytics

State:
Created 4 years ago
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

rpgoldmancommented, Jan 14, 2020

@ahartikainen I would like to compare the models as a whole (rather than comparing the submodels), so I think I want to do what @OriolAbril suggests: effectively concatenate all the observed random variables into one big random variable. This is reasonable, because the observations are interchangeable by the model.

0reactions

OriolAbrilcommented, Feb 5, 2020

After combining all the log_likelihood data into a single array (stored in sample_stats.log_likelihood), waic and loo will calculate the IC assuming all observations are conditionally independent (or independent, I am not completely sure, I have not found time to do the math), waic and loo already worked with n-dimensional arrays. If this is your case, then the results will be correct, otherwise you would have to implement the correct version of the algorithm or wait a little still.

Note: ArviZ currently looks first at sample_stats to raise a warning if log likelihood is still there, therefore, doing this will only get a deprecation warning and not the annoying "Found several log likelihood arrays {}, var_name cannot be None" error.

Top Results From Across the Web

A question-entailment approach to question answering

First, we compare logistic regression and deep learning methods for RQE using different kinds of datasets including textual inference, question ...

A comparison of multivariable regression models to analyse ...

Method: This paper compares the appropriateness of various multivariable models of cost data by examining regression diagnostics, using as an example data ...

Inference for multivariate normal hierarchical models

This paper provides a new method and algorithm for making inferences about the parameters of a two-level multivariate normal hierarchical ...

A Review on Deep Learning Techniques Applied to Answer ...

The representations of the input sentences are then compared at several granularities using multiple similarity metrics. Finally, the comparison ...

arXiv:1810.09774v3 [cs.CL] 31 May 2019

NLI models using datasets where label preserving swapping operations have been applied, reporting significant performance drops compared to ...