Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

sample_posterior_predictive interferes into the summary of the previous sample

See original GitHub issue

For example, if I use the following code:

with pm.Model() as model:
    mu = pm.Normal('mu')
    sigma = pm.HalfNormal('sigma')
    pm.Normal('like',mu,sigma,observed=X)
    trace_X = pm.sample()
    
    Y_ = pm.Normal('Y_', mu, sigma)
    Y = pm.Deterministic('Y',Y_ + 2)
    
    ppc = pm.sample_posterior_predictive(trace_X, vars=[Y])

and require

az.summary(trace_X)

I would get an error:

TypeError                                 Traceback (most recent call last)
<ipython-input-14-475fd4d1b7ee> in <module>
----> 1 az.summary(trace_X)

~/anaconda3/lib/python3.6/site-packages/arviz/stats/stats.py in summary(data, var_names, fmt, round_to, include_circ, stat_funcs, extend, credible_interval, order, index_origin)
    843 
    844     """
--> 845     posterior = convert_to_dataset(data, group="posterior")
    846     var_names = _var_names(var_names, posterior)
    847     posterior = posterior if var_names is None else posterior[var_names]

~/anaconda3/lib/python3.6/site-packages/arviz/data/converters.py in convert_to_dataset(obj, group, coords, dims)
    160     xarray.Dataset
    161     """
--> 162     inference_data = convert_to_inference_data(obj, group=group, coords=coords, dims=dims)
    163     dataset = getattr(inference_data, group, None)
    164     if dataset is None:

~/anaconda3/lib/python3.6/site-packages/arviz/data/converters.py in convert_to_inference_data(obj, group, coords, dims, **kwargs)
     81             return from_pystan(**kwargs)
     82     elif obj.__class__.__name__ == "MultiTrace":  # ugly, but doesn't make PyMC3 a requirement
---> 83         return from_pymc3(trace=kwargs.pop(group), **kwargs)
     84     elif obj.__class__.__name__ == "EnsembleSampler":  # ugly, but doesn't make emcee a requirement
     85         return from_emcee(sampler=kwargs.pop(group), **kwargs)

~/anaconda3/lib/python3.6/site-packages/arviz/data/io_pymc3.py in from_pymc3(trace, prior, posterior_predictive, coords, dims)
    224         posterior_predictive=posterior_predictive,
    225         coords=coords,
--> 226         dims=dims,
    227     ).to_inference_data()

~/anaconda3/lib/python3.6/site-packages/arviz/data/io_pymc3.py in to_inference_data(self)
    208             **{
    209                 "posterior": self.posterior_to_xarray(),
--> 210                 "sample_stats": self.sample_stats_to_xarray(),
    211                 "posterior_predictive": self.posterior_predictive_to_xarray(),
    212                 "prior": self.prior_to_xarray(),

~/anaconda3/lib/python3.6/site-packages/arviz/data/base.py in wrapped(cls, *args, **kwargs)
     30                 if all([getattr(cls, prop_i) is None for prop_i in prop]):
     31                     return None
---> 32             return func(cls, *args, **kwargs)
     33 
     34         return wrapped

~/anaconda3/lib/python3.6/site-packages/arviz/data/io_pymc3.py in sample_stats_to_xarray(self)
    104             name = rename_key.get(stat, stat)
    105             data[name] = np.array(self.trace.get_sampler_stats(stat, combine=False))
--> 106         log_likelihood, dims = self._extract_log_likelihood()
    107         if log_likelihood is not None:
    108             data["log_likelihood"] = log_likelihood

~/anaconda3/lib/python3.6/site-packages/arviz/data/base.py in wrapped(cls, *args, **kwargs)
     30                 if all([getattr(cls, prop_i) is None for prop_i in prop]):
     31                     return None
---> 32             return func(cls, *args, **kwargs)
     33 
     34         return wrapped

~/anaconda3/lib/python3.6/site-packages/arviz/data/base.py in wrapped(cls, *args, **kwargs)
     30                 if all([getattr(cls, prop_i) is None for prop_i in prop]):
     31                     return None
---> 32             return func(cls, *args, **kwargs)
     33 
     34         return wrapped

~/anaconda3/lib/python3.6/site-packages/arviz/data/io_pymc3.py in _extract_log_likelihood(self)
     81         chain_likelihoods = []
     82         for chain in self.trace.chains:
---> 83             log_like = [log_likelihood_vals_point(point) for point in self.trace.points([chain])]
     84             chain_likelihoods.append(np.stack(log_like))
     85         return np.stack(chain_likelihoods), coord_name

~/anaconda3/lib/python3.6/site-packages/arviz/data/io_pymc3.py in <listcomp>(.0)
     81         chain_likelihoods = []
     82         for chain in self.trace.chains:
---> 83             log_like = [log_likelihood_vals_point(point) for point in self.trace.points([chain])]
     84             chain_likelihoods.append(np.stack(log_like))
     85         return np.stack(chain_likelihoods), coord_name

~/anaconda3/lib/python3.6/site-packages/arviz/data/io_pymc3.py in log_likelihood_vals_point(point)
     73             log_like_vals = []
     74             for var, log_like in cached:
---> 75                 log_like_val = utils.one_de(log_like(point))
     76                 if var.missing_values:
     77                     log_like_val = log_like_val[~var.observations.mask]

~/anaconda3/lib/python3.6/site-packages/pymc3/model.py in __call__(self, *args, **kwargs)
   1280     def __call__(self, *args, **kwargs):
   1281         point = Point(model=self.model, *args, **kwargs)
-> 1282         return self.f(**point)
   1283 
   1284 compilef = fastfn

~/anaconda3/lib/python3.6/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    884                     raise TypeError("Missing required input: %s" %
    885                                     getattr(self.inv_finder[c], 'variable',
--> 886                                             self.inv_finder[c]))
    887                 if c.provided > 1:
    888                     restore_defaults()

TypeError: Missing required input: Y_

(see the gist here)

Expected behavior It worked as expected with previous pymc version 3.7

Issue Analytics

State:
Created 4 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

OriolAbrilcommented, Dec 16, 2019

@aakhmetz I had some time to look into this. And the issue is actually not related to the one I thought, it is basically that this usecase is not supported by ArviZ.

In from_pymc3, ArviZ tries to extract all the information from the trace object and store it a multidimensional labeled arrays. One of the information it tries to extract is the log likelihood, which in pymc3 it can be computed by calling var.logp_elemwise on observed variables. In your case like is an observed variable. The issue is with the call signature of this function. logp_elemwise expects as input a dictionary with keys as variable names and with values the value of each variable at some given draw. ArviZ tries to call it with the samples stored in the trace, however, as the trace does not contain Y_ nor Y (which are now expected after including them in the model) it raises an error.

I don’t know how to solve this though, I can only provide two workarounds.

The first is to convert the trace to inference data before modifying the model:

with pm.Model() as model:
    mu = pm.Normal('mu')
    sigma = pm.HalfNormal('sigma')
    pm.Normal('like',mu,sigma,observed=X)
    trace_X = pm.sample()
    idata = az.from_pymc3(trace_X)
    
    Y_ = pm.Normal('Y_', mu, sigma)
    Y = pm.Deterministic('Y',Y_ + 2)
    
    ppc = pm.sample_posterior_predictive(trace_X, vars=[Y])

az.summary(idata)

This has the advantage of including the proper log likelihood in the inference data object, I don’t know if you need this info.

The second option is to tweak with the model to make ArviZ believe there are no observed variables so it does not try to calculate likelihoods.

with pm.Model() as model:
    mu = pm.Normal('mu')
    sigma = pm.HalfNormal('sigma')
    pm.Normal('like',mu,sigma,observed=X)
    trace_X = pm.sample()
    
    Y_ = pm.Normal('Y_', mu, sigma)
    Y = pm.Deterministic('Y',Y_ + 2)
    
    ppc = pm.sample_posterior_predictive(trace_X, vars=[Y])

model.observed_RVs = []
az.summary(trace_X)

This will not make the log likelihood data available, but it would allow to include the posterior predictive samples in the inference data object:

model.observed_RVs = []
idata = az.from_pymc3(trace_X, posterior_predictive=ppc)
az.summary(idata)

# access Y values with
idata.posterior_predictive.Y

# or directly plot their kde
az.plot_posterior(idata, group="posterior_predictive")

Converting “manually” to inference data has the advantage of avoiding the conversion every single time you call an ArviZ function and it also allows to use custom named coordinates and dimensions and take advantage of all the groups. However, I think that none of the options above handles properly the group constant_data, I don’t think it is relevant though.

0reactions

aakhmetzcommented, Dec 1, 2019

@OriolAbril Thank you very much for your time! Yes, it looks reasonable. The command az.from_pymc3 is something new for me, so I will try to read more.

After your reply I realized that I also used separated blocks before:

with pm.Model() as model:
    mu = pm.Normal('mu')
    sigma = pm.HalfNormal('sigma')
    pm.Normal('like',mu,sigma,observed=X)
    trace_X = pm.sample()

display(pm.summary(trace_X))
    
with model:
    Y_ = pm.Normal('Y_', mu, sigma)
    Y = pm.Deterministic('Y',Y_ + 2)
    
    ppc = pm.sample_posterior_predictive(trace_X, vars=[Y])

which, I believe, should also work. But some time ago, I adopted my previous shortcut.

Thank you again for checking

Top Results From Across the Web

7.1 Posterior predictive checking | An Introduction to Bayesian ...

Posterior predictive checking involves comparing the observed data to simulated samples (or some summary statistics) generated from the posterior predictive ...

Statistical Rethinking: Chapter 3 Practice - AWS

These samples can be used to produce intervals, point estimates, posterior predictive checks, as well as other kinds of simulations. Posterior predictive ......

Posterior predictive checks

Posterior predictive checks. 2022-03-23. This tutorial explains how to check that the fitted model is compatible with the observed data.

Posterior Predictive Check - an overview | ScienceDirect Topics

Another approach to a posterior predictive check is to create a posterior predictive sampling distribution of a measure of discrepancy between the ...

Analysing Posterior Predictive Distributions with PyMC3

Posterior distributions allow for updating of prior beliefs through taking new evidence (or data) into account when generating such ...