Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Steps to troubleshoot if `fit` fails

See original GitHub issue

👋 I’m really enjoying the ability to perform Bayesian mixed-effect models with bambi, but I had a frustrating first experience. I’ll detail my thought process as an end-user.

I followed the examples in the docs, but when I went to perform a mixed-effects model with my data (csv attached below), I hit the following bug:

model = Model("od ~ temp + (1|source) + 0", df)
results = model.fit(draws=2000, chains=2)

/Users/camerondavidson-pilon/venvs/data/lib/python3.9/site-packages/pymc3/step_methods/hmc/quadpotential.py:224: RuntimeWarning: divide by zero encountered in true_divide
  np.divide(1, self._stds, out=self._inv_stds)
/Users/camerondavidson-pilon/venvs/data/lib/python3.9/site-packages/pymc3/step_methods/hmc/quadpotential.py:203: RuntimeWarning: invalid value encountered in multiply
  return np.multiply(self._var, x, out=out)
---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
  File "/Users/camerondavidson-pilon/venvs/data/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 137, in run
    self._start_loop()
  File "/Users/camerondavidson-pilon/venvs/data/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 191, in _start_loop
    point, stats = self._compute_point()
  File "/Users/camerondavidson-pilon/venvs/data/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 216, in _compute_point
    point, stats = self._step_method.step(self._point)
  File "/Users/camerondavidson-pilon/venvs/data/lib/python3.9/site-packages/pymc3/step_methods/arraystep.py", line 276, in step
    apoint, stats = self.astep(array)
  File "/Users/camerondavidson-pilon/venvs/data/lib/python3.9/site-packages/pymc3/step_methods/hmc/base_hmc.py", line 147, in astep
    self.potential.raise_ok(self._logp_dlogp_func._ordering.vmap)
  File "/Users/camerondavidson-pilon/venvs/data/lib/python3.9/site-packages/pymc3/step_methods/hmc/quadpotential.py", line 272, in raise_ok
    raise ValueError("\n".join(errmsg))
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV `1|source_offset`.ravel()[0] is zero.
The derivative of RV `1|source_sigma_log__`.ravel()[0] is zero.
"""

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV `1|source_offset`.ravel()[0] is zero.
The derivative of RV `1|source_sigma_log__`.ravel()[0] is zero.

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-95-55d7fad1560b> in <module>
----> 1 results = model.fit(draws=2000, chains=2)

~/venvs/data/lib/python3.9/site-packages/bambi/models.py in fit(self, omit_offsets, backend, **kwargs)
    213             )
    214
--> 215         return self.backend.run(omit_offsets=omit_offsets, **kwargs)
    216
    217     def build(self, backend="pymc"):

~/venvs/data/lib/python3.9/site-packages/bambi/backends/pymc.py in run(self, start, method, init, n_init, omit_offsets, **kwargs)
    133             draws = kwargs.pop("draws", 1000)
    134             with model:
--> 135                 idata = pm.sample(
    136                     draws,
    137                     start=start,

~/venvs/data/lib/python3.9/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, jitter_max_retries, return_inferencedata, idata_kwargs, mp_ctx, pickle_backend, **kwargs)
    557         _print_step_hierarchy(step)
    558         try:
--> 559             trace = _mp_sample(**sample_args, **parallel_args)
    560         except pickle.PickleError:
    561             _log.warning("Could not pickle model, sampling singlethreaded.")

~/venvs/data/lib/python3.9/site-packages/pymc3/sampling.py in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, callback, discard_tuned_samples, mp_ctx, pickle_backend, **kwargs)
   1475         try:
   1476             with sampler:
-> 1477                 for draw in sampler:
   1478                     trace = traces[draw.chain - chain]
   1479                     if trace.supports_sampler_stats and draw.stats is not None:

~/venvs/data/lib/python3.9/site-packages/pymc3/parallel_sampling.py in __iter__(self)
    477
    478         while self._active:
--> 479             draw = ProcessAdapter.recv_draw(self._active)
    480             proc, is_last, draw, tuning, stats, warns = draw
    481             self._total_draws += 1

~/venvs/data/lib/python3.9/site-packages/pymc3/parallel_sampling.py in recv_draw(processes, timeout)
    357             else:
    358                 error = RuntimeError("Chain %s failed." % proc.chain)
--> 359             raise error from old_error
    360         elif msg[0] == "writing_done":
    361             proc._readable = True

RuntimeError: Chain 0 failed.

My next steps were to scale back the formula to something simpler:

model = Model("od ~ temp + 1", df)
results = model.fit(draws=2000, chains=2)

This failed, similarly. The error messages didn’t provide me with much information about what could have been wrong, and googling the error suggested a chain was falling into a bad region - okay.

I inspected my data, and nothing about it looked off. I inspected the priors by printing model, and they seemed a tighter than I was expecting.

Formula: od ~ temp + 1
Family name: Gaussian
Link: identity
Observations: 39
Priors:
  Intercept ~ Normal(mu: 0.03708401, sigma: 0.01820133)
  temp ~ Normal(mu: 0, sigma: 0.00049071)
  sigma ~ HalfStudentT(nu: 4, sigma: 0.00307659)

I thought the problem may have been here for a while and so I tried widening the priors - no effect.

On a whim, I tried init="adapt_diag" in the fit as I saw this in other PyMC3 examples, and this worked. I was able to run both models now successfully. I guess the jitter was pushing my (tiny) priors into bad regions?

I’m lucky that I’m familiar with how Bayesian inference work in the backend, but I imagine other users, who are attracted to the high-level API of bambi, have less experience, and probably would have churned off the package quickly. My suggestion would be some docs on troubleshooting if fit fails, or even better: having bambi detect these problems and correct them auto-magically before inference is done.

Dataset used: obs.csv

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:5 (2 by maintainers)

Top GitHub Comments

2reactions

CamDavidsonPiloncommented, Jul 28, 2021

Thanks team,

A solution like #383 is kinda what I was hoping for: a low-level, under-the-covers check for problems and provided solution so users don’t need to go digging around discourse or ipynb docs. This would have likely completely solved the problem I was having. I’m sure others will benefit, too.

0reactions

tomicaprettocommented, Aug 4, 2021

Feel free to open it again if you experience new problems any other problems with the sampler initialization