question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

problem while processing pymc3 data

See original GitHub issue

I’m unable to sample a blackbox likelihood in pymc3. There is an exception raised when arviz is asked to process the trace. Based on the stacktrace I think the issue is on the arviz side.

Python 3.8.3 pymc3 3.9.2 arviz 0.9.0 theano 1.0.4 macOS

all packages installed via pip into a clean conda env. The MWE is:

import numpy as np
import pymc3 as pm
import theano
import theano.tensor as tt
theano.config.exception_verbosity='high'


def line(theta, x, *args, **kwds):
    p_arr = np.squeeze(np.array(theta))
    return p_arr[1] + x * p_arr[0]


def my_loglike(theta, x, data, sigma):
    """
    A Gaussian log-likelihood function for a model with parameters given in theta
    """

    model = line(theta, x)
    return -(0.5/sigma**2)*np.sum((data - model)**2)


class LogLike(tt.Op):
    itypes = [tt.dvector] # expects a vector of parameter values when called
    otypes = [tt.dscalar] # outputs a single scalar value (the log likelihood)

    def __init__(self, loglike, data, x, sigma):

        # add inputs as class attributes
        self.likelihood = loglike
        self.x = x
        self.data=data
        self.sigma=sigma

    def perform(self, node, inputs, outputs):
        # the method that is used when calling the Op
        theta = inputs  # this will contain my variables
        # call the log-likelihood function
        logl = self.likelihood(theta, self.x, self.data, self.sigma)
        outputs[0][0] = np.array(logl)
           

# set up our data
N = 10  # number of data points
sigma = 1.  # standard deviation of noise
x = np.linspace(0., 9., N)

mtrue = 0.4  # true gradient
ctrue = 3.   # true y-intercept

truemodel = line([mtrue, ctrue], x)


# make data
np.random.seed(716742)  # set random seed, so the data is reproducible each time
data = sigma*np.random.randn(N) + truemodel

ndraws = 3000  # number of draws from the distribution
nburn = 1000   # number of "burn-in points" (which we'll discard)

logl = LogLike(my_loglike, data, x, sigma)

with pm.Model():
    # your external function takes two parameters, a and b, with Uniform priors
    m = pm.Uniform('m', lower=-10., upper=10.)
    c = pm.Uniform('c', lower=-10., upper=10.)

    # convert m and c to a tensor vector
    theta = tt.as_tensor_variable([m, c])

    # use a DensityDist (use a lamdba function to "call" the Op)
    pm.DensityDist('likelihood', lambda v: logl(v), observed={'v': theta})
    trace = pm.sample(ndraws, tune=nburn, discard_tuned_samples=True)

gives the stack trace:

---------------------------------------------------------------------------
MissingInputError                         Traceback (most recent call last)
<ipython-input-5-a6fb0239c4a6> in <module>
     72     # use a DensityDist (use a lamdba function to "call" the Op)
     73     pm.DensityDist('likelihood', lambda v: logl(v), observed={'v': theta})
---> 74     trace = pm.sample(ndraws, tune=nburn, discard_tuned_samples=True)

~/miniconda3/envs/dev3/lib/python3.8/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, return_inferencedata, idata_kwargs, **kwargs)
    597         if idata_kwargs:
    598             ikwargs.update(idata_kwargs)
--> 599         idata = arviz.from_pymc3(trace, **ikwargs)
    600 
    601     if compute_convergence_checks:

~/miniconda3/envs/dev3/lib/python3.8/site-packages/arviz/data/io_pymc3.py in from_pymc3(trace, prior, posterior_predictive, log_likelihood, coords, dims, model, save_warmup)
    521     InferenceData
    522     """
--> 523     return PyMC3Converter(
    524         trace=trace,
    525         prior=prior,

~/miniconda3/envs/dev3/lib/python3.8/site-packages/arviz/data/io_pymc3.py in __init__(self, trace, prior, posterior_predictive, log_likelihood, predictions, coords, dims, model, save_warmup)
    157             self.dims = {**model_dims, **self.dims}
    158 
--> 159         self.observations, self.multi_observations = self.find_observations()
    160 
    161     def find_observations(self) -> Tuple[Optional[Dict[str, Var]], Optional[Dict[str, Var]]]:

~/miniconda3/envs/dev3/lib/python3.8/site-packages/arviz/data/io_pymc3.py in find_observations(self)
    170             elif hasattr(obs, "data"):
    171                 for key, val in obs.data.items():
--> 172                     multi_observations[key] = val.eval() if hasattr(val, "eval") else val
    173         return observations, multi_observations
    174 

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/gof/graph.py in eval(self, inputs_to_values)
    520         inputs = tuple(sorted(inputs_to_values.keys(), key=id))
    521         if inputs not in self._fn_cache:
--> 522             self._fn_cache[inputs] = theano.function(inputs, self)
    523         args = [inputs_to_values[param] for param in inputs]
    524 

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/compile/function.py in function(inputs, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input)
    304         # note: pfunc will also call orig_function -- orig_function is
    305         #      a choke point that all compilation must pass through
--> 306         fn = pfunc(params=inputs,
    307                    outputs=outputs,
    308                    mode=mode,

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/compile/pfunc.py in pfunc(params, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input, output_keys)
    481         inputs.append(si)
    482 
--> 483     return orig_function(inputs, cloned_outputs, mode,
    484                          accept_inplace=accept_inplace, name=name,
    485                          profile=profile, on_unused_input=on_unused_input,

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/compile/function_module.py in orig_function(inputs, outputs, mode, accept_inplace, name, profile, on_unused_input, output_keys)
   1830     try:
   1831         Maker = getattr(mode, 'function_maker', FunctionMaker)
-> 1832         m = Maker(inputs,
   1833                   outputs,
   1834                   mode,

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/compile/function_module.py in __init__(self, inputs, outputs, mode, accept_inplace, function_builder, profile, on_unused_input, fgraph, output_keys, name)
   1484             # make the fgraph (copies the graph, creates NEW INPUT AND
   1485             # OUTPUT VARIABLES)
-> 1486             fgraph, additional_outputs = std_fgraph(inputs, outputs,
   1487                                                     accept_inplace)
   1488             fgraph.profile = profile

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/compile/function_module.py in std_fgraph(input_specs, output_specs, accept_inplace)
    178     orig_outputs = [spec.variable for spec in output_specs] + updates
    179 
--> 180     fgraph = gof.fg.FunctionGraph(orig_inputs, orig_outputs,
    181                                   update_mapping=update_mapping)
    182 

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/gof/fg.py in __init__(self, inputs, outputs, features, clone, update_mapping)
    173 
    174         for output in outputs:
--> 175             self.__import_r__(output, reason="init")
    176         for i, output in enumerate(outputs):
    177             output.clients.append(('output', i))

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/gof/fg.py in __import_r__(self, variable, reason)
    344         # Imports the owners of the variables
    345         if variable.owner and variable.owner not in self.apply_nodes:
--> 346                 self.__import__(variable.owner, reason=reason)
    347         elif (variable.owner is None and
    348                 not isinstance(variable, graph.Constant) and

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/gof/fg.py in __import__(self, apply_node, check, reason)
    389                                      "for more information on this error."
    390                                      % (node.inputs.index(r), str(node)))
--> 391                         raise MissingInputError(error_msg, variable=r)
    392 
    393         for node in new_nodes:

MissingInputError: Input 0 of the graph (indices start from 0), used to compute sigmoid(c_interval__), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
andyfaffcommented, Jul 5, 2020

Thank you for the detective work on this so far. pm.Potential works in the meantime.

I should’ve pointed out that the example comes from a pymc3 tutorial itself. I’ve been looking for exactly this kind of tutorial notebook for a long time, black box likelihoods are quite common in the area in which I work. Once the problem has been resolved, that may need to be updated.

The reason the NUTS initialisation fails is because there is no gradient specified in the tensor op (so far).

1reaction
OriolAbrilcommented, Jul 4, 2020

Thanks for reporting, I have to admit I am completely at a loss as to why the “observed data” (in this case due to the structure of the DensityDist I think theta is understood as observed_data from ArviZ side) has eval method but can’t be evaluated, maybe @rpgoldman has some idea on this? Maybe it’s due to theta/v being “sampled” and thus having a different values for each draw and chain?

In the meantime I’d recommend using pm.Potential instead. Modifying the DensityDist to the line below should fix the problem:

pm.Potential('likelihood', logl(theta))

I think both alternatives are equivalent implementations of the exact same model with the difference that as pm.Potential has no observed argument, ArviZ does not try to retrieve and store the values passed to observed kwarg to the observed_data group.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Slow sampling in pymc3 (on "tutorial problem") - Questions
I am trying to reproduce the “challenger disaster” analysis with code: import pymc3 as pm import theano.tensor as tt temperature = data[:,0].astype(float) D ......
Read more >
Experiencing strange error message from PyMC3 - Python 3.8.5
The code below runs more or less fine in Spyder, but I'm having issues in PyCharm. The code runs smoothly up until the...
Read more >
Using PyMC3 — Computational Statistics in Python 0.1 ...
We have observations of height and weight and want to use a logistic model to guess the sex. # observed data df =...
Read more >
A quick intro to PyMC3 - exoplanet
In this tutorial, we will go through two simple examples of fitting some data using PyMC3. The first is the classic fitting a...
Read more >
Machine Learning with 10 Data Points - Or an Intro to PyMC3
You've heard of big data, but what about small data ?Link to Code : https://github.com/ritvikmath/YouTubeVideoCode/blob/main/ PyMC3.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found