question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

sample_posterior_predictive interferes into the summary of the previous sample

See original GitHub issue

For example, if I use the following code:

with pm.Model() as model:
    mu = pm.Normal('mu')
    sigma = pm.HalfNormal('sigma')
    pm.Normal('like',mu,sigma,observed=X)
    trace_X = pm.sample()
    
    Y_ = pm.Normal('Y_', mu, sigma)
    Y = pm.Deterministic('Y',Y_ + 2)
    
    ppc = pm.sample_posterior_predictive(trace_X, vars=[Y])

and require

az.summary(trace_X)

I would get an error:

TypeError                                 Traceback (most recent call last)
<ipython-input-14-475fd4d1b7ee> in <module>
----> 1 az.summary(trace_X)

~/anaconda3/lib/python3.6/site-packages/arviz/stats/stats.py in summary(data, var_names, fmt, round_to, include_circ, stat_funcs, extend, credible_interval, order, index_origin)
    843 
    844     """
--> 845     posterior = convert_to_dataset(data, group="posterior")
    846     var_names = _var_names(var_names, posterior)
    847     posterior = posterior if var_names is None else posterior[var_names]

~/anaconda3/lib/python3.6/site-packages/arviz/data/converters.py in convert_to_dataset(obj, group, coords, dims)
    160     xarray.Dataset
    161     """
--> 162     inference_data = convert_to_inference_data(obj, group=group, coords=coords, dims=dims)
    163     dataset = getattr(inference_data, group, None)
    164     if dataset is None:

~/anaconda3/lib/python3.6/site-packages/arviz/data/converters.py in convert_to_inference_data(obj, group, coords, dims, **kwargs)
     81             return from_pystan(**kwargs)
     82     elif obj.__class__.__name__ == "MultiTrace":  # ugly, but doesn't make PyMC3 a requirement
---> 83         return from_pymc3(trace=kwargs.pop(group), **kwargs)
     84     elif obj.__class__.__name__ == "EnsembleSampler":  # ugly, but doesn't make emcee a requirement
     85         return from_emcee(sampler=kwargs.pop(group), **kwargs)

~/anaconda3/lib/python3.6/site-packages/arviz/data/io_pymc3.py in from_pymc3(trace, prior, posterior_predictive, coords, dims)
    224         posterior_predictive=posterior_predictive,
    225         coords=coords,
--> 226         dims=dims,
    227     ).to_inference_data()

~/anaconda3/lib/python3.6/site-packages/arviz/data/io_pymc3.py in to_inference_data(self)
    208             **{
    209                 "posterior": self.posterior_to_xarray(),
--> 210                 "sample_stats": self.sample_stats_to_xarray(),
    211                 "posterior_predictive": self.posterior_predictive_to_xarray(),
    212                 "prior": self.prior_to_xarray(),

~/anaconda3/lib/python3.6/site-packages/arviz/data/base.py in wrapped(cls, *args, **kwargs)
     30                 if all([getattr(cls, prop_i) is None for prop_i in prop]):
     31                     return None
---> 32             return func(cls, *args, **kwargs)
     33 
     34         return wrapped

~/anaconda3/lib/python3.6/site-packages/arviz/data/io_pymc3.py in sample_stats_to_xarray(self)
    104             name = rename_key.get(stat, stat)
    105             data[name] = np.array(self.trace.get_sampler_stats(stat, combine=False))
--> 106         log_likelihood, dims = self._extract_log_likelihood()
    107         if log_likelihood is not None:
    108             data["log_likelihood"] = log_likelihood

~/anaconda3/lib/python3.6/site-packages/arviz/data/base.py in wrapped(cls, *args, **kwargs)
     30                 if all([getattr(cls, prop_i) is None for prop_i in prop]):
     31                     return None
---> 32             return func(cls, *args, **kwargs)
     33 
     34         return wrapped

~/anaconda3/lib/python3.6/site-packages/arviz/data/base.py in wrapped(cls, *args, **kwargs)
     30                 if all([getattr(cls, prop_i) is None for prop_i in prop]):
     31                     return None
---> 32             return func(cls, *args, **kwargs)
     33 
     34         return wrapped

~/anaconda3/lib/python3.6/site-packages/arviz/data/io_pymc3.py in _extract_log_likelihood(self)
     81         chain_likelihoods = []
     82         for chain in self.trace.chains:
---> 83             log_like = [log_likelihood_vals_point(point) for point in self.trace.points([chain])]
     84             chain_likelihoods.append(np.stack(log_like))
     85         return np.stack(chain_likelihoods), coord_name

~/anaconda3/lib/python3.6/site-packages/arviz/data/io_pymc3.py in <listcomp>(.0)
     81         chain_likelihoods = []
     82         for chain in self.trace.chains:
---> 83             log_like = [log_likelihood_vals_point(point) for point in self.trace.points([chain])]
     84             chain_likelihoods.append(np.stack(log_like))
     85         return np.stack(chain_likelihoods), coord_name

~/anaconda3/lib/python3.6/site-packages/arviz/data/io_pymc3.py in log_likelihood_vals_point(point)
     73             log_like_vals = []
     74             for var, log_like in cached:
---> 75                 log_like_val = utils.one_de(log_like(point))
     76                 if var.missing_values:
     77                     log_like_val = log_like_val[~var.observations.mask]

~/anaconda3/lib/python3.6/site-packages/pymc3/model.py in __call__(self, *args, **kwargs)
   1280     def __call__(self, *args, **kwargs):
   1281         point = Point(model=self.model, *args, **kwargs)
-> 1282         return self.f(**point)
   1283 
   1284 compilef = fastfn

~/anaconda3/lib/python3.6/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    884                     raise TypeError("Missing required input: %s" %
    885                                     getattr(self.inv_finder[c], 'variable',
--> 886                                             self.inv_finder[c]))
    887                 if c.provided > 1:
    888                     restore_defaults()

TypeError: Missing required input: Y_

(see the gist here)

Expected behavior It worked as expected with previous pymc version 3.7

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
OriolAbrilcommented, Dec 16, 2019

@aakhmetz I had some time to look into this. And the issue is actually not related to the one I thought, it is basically that this usecase is not supported by ArviZ.

In from_pymc3, ArviZ tries to extract all the information from the trace object and store it a multidimensional labeled arrays. One of the information it tries to extract is the log likelihood, which in pymc3 it can be computed by calling var.logp_elemwise on observed variables. In your case like is an observed variable. The issue is with the call signature of this function. logp_elemwise expects as input a dictionary with keys as variable names and with values the value of each variable at some given draw. ArviZ tries to call it with the samples stored in the trace, however, as the trace does not contain Y_ nor Y (which are now expected after including them in the model) it raises an error.

I don’t know how to solve this though, I can only provide two workarounds.

The first is to convert the trace to inference data before modifying the model:

with pm.Model() as model:
    mu = pm.Normal('mu')
    sigma = pm.HalfNormal('sigma')
    pm.Normal('like',mu,sigma,observed=X)
    trace_X = pm.sample()
    idata = az.from_pymc3(trace_X)
    
    Y_ = pm.Normal('Y_', mu, sigma)
    Y = pm.Deterministic('Y',Y_ + 2)
    
    ppc = pm.sample_posterior_predictive(trace_X, vars=[Y])

az.summary(idata)

This has the advantage of including the proper log likelihood in the inference data object, I don’t know if you need this info.

The second option is to tweak with the model to make ArviZ believe there are no observed variables so it does not try to calculate likelihoods.

with pm.Model() as model:
    mu = pm.Normal('mu')
    sigma = pm.HalfNormal('sigma')
    pm.Normal('like',mu,sigma,observed=X)
    trace_X = pm.sample()
    
    Y_ = pm.Normal('Y_', mu, sigma)
    Y = pm.Deterministic('Y',Y_ + 2)
    
    ppc = pm.sample_posterior_predictive(trace_X, vars=[Y])

model.observed_RVs = []
az.summary(trace_X)

This will not make the log likelihood data available, but it would allow to include the posterior predictive samples in the inference data object:

model.observed_RVs = []
idata = az.from_pymc3(trace_X, posterior_predictive=ppc)
az.summary(idata)

# access Y values with
idata.posterior_predictive.Y

# or directly plot their kde
az.plot_posterior(idata, group="posterior_predictive")

Converting “manually” to inference data has the advantage of avoiding the conversion every single time you call an ArviZ function and it also allows to use custom named coordinates and dimensions and take advantage of all the groups. However, I think that none of the options above handles properly the group constant_data, I don’t think it is relevant though.

0reactions
aakhmetzcommented, Dec 1, 2019

@OriolAbril Thank you very much for your time! Yes, it looks reasonable. The command az.from_pymc3 is something new for me, so I will try to read more.

After your reply I realized that I also used separated blocks before:

with pm.Model() as model:
    mu = pm.Normal('mu')
    sigma = pm.HalfNormal('sigma')
    pm.Normal('like',mu,sigma,observed=X)
    trace_X = pm.sample()

display(pm.summary(trace_X))
    
with model:
    Y_ = pm.Normal('Y_', mu, sigma)
    Y = pm.Deterministic('Y',Y_ + 2)
    
    ppc = pm.sample_posterior_predictive(trace_X, vars=[Y])

which, I believe, should also work. But some time ago, I adopted my previous shortcut.

Thank you again for checking

Read more comments on GitHub >

github_iconTop Results From Across the Web

7.1 Posterior predictive checking | An Introduction to Bayesian ...
Posterior predictive checking involves comparing the observed data to simulated samples (or some summary statistics) generated from the posterior predictive ...
Read more >
Statistical Rethinking: Chapter 3 Practice - AWS
These samples can be used to produce intervals, point estimates, posterior predictive checks, as well as other kinds of simulations. Posterior predictive ......
Read more >
Posterior predictive checks
Posterior predictive checks. 2022-03-23. This tutorial explains how to check that the fitted model is compatible with the observed data.
Read more >
Posterior Predictive Check - an overview | ScienceDirect Topics
Another approach to a posterior predictive check is to create a posterior predictive sampling distribution of a measure of discrepancy between the ...
Read more >
Analysing Posterior Predictive Distributions with PyMC3
Posterior distributions allow for updating of prior beliefs through taking new evidence (or data) into account when generating such ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found