Steps to troubleshoot if `fit` fails
See original GitHub issue👋 I’m really enjoying the ability to perform Bayesian mixed-effect models with bambi, but I had a frustrating first experience. I’ll detail my thought process as an end-user.
I followed the examples in the docs, but when I went to perform a mixed-effects model with my data (csv attached below), I hit the following bug:
model = Model("od ~ temp + (1|source) + 0", df)
results = model.fit(draws=2000, chains=2)
/Users/camerondavidson-pilon/venvs/data/lib/python3.9/site-packages/pymc3/step_methods/hmc/quadpotential.py:224: RuntimeWarning: divide by zero encountered in true_divide
np.divide(1, self._stds, out=self._inv_stds)
/Users/camerondavidson-pilon/venvs/data/lib/python3.9/site-packages/pymc3/step_methods/hmc/quadpotential.py:203: RuntimeWarning: invalid value encountered in multiply
return np.multiply(self._var, x, out=out)
---------------------------------------------------------------------------
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/camerondavidson-pilon/venvs/data/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 137, in run
self._start_loop()
File "/Users/camerondavidson-pilon/venvs/data/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 191, in _start_loop
point, stats = self._compute_point()
File "/Users/camerondavidson-pilon/venvs/data/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 216, in _compute_point
point, stats = self._step_method.step(self._point)
File "/Users/camerondavidson-pilon/venvs/data/lib/python3.9/site-packages/pymc3/step_methods/arraystep.py", line 276, in step
apoint, stats = self.astep(array)
File "/Users/camerondavidson-pilon/venvs/data/lib/python3.9/site-packages/pymc3/step_methods/hmc/base_hmc.py", line 147, in astep
self.potential.raise_ok(self._logp_dlogp_func._ordering.vmap)
File "/Users/camerondavidson-pilon/venvs/data/lib/python3.9/site-packages/pymc3/step_methods/hmc/quadpotential.py", line 272, in raise_ok
raise ValueError("\n".join(errmsg))
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV `1|source_offset`.ravel()[0] is zero.
The derivative of RV `1|source_sigma_log__`.ravel()[0] is zero.
"""
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV `1|source_offset`.ravel()[0] is zero.
The derivative of RV `1|source_sigma_log__`.ravel()[0] is zero.
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
<ipython-input-95-55d7fad1560b> in <module>
----> 1 results = model.fit(draws=2000, chains=2)
~/venvs/data/lib/python3.9/site-packages/bambi/models.py in fit(self, omit_offsets, backend, **kwargs)
213 )
214
--> 215 return self.backend.run(omit_offsets=omit_offsets, **kwargs)
216
217 def build(self, backend="pymc"):
~/venvs/data/lib/python3.9/site-packages/bambi/backends/pymc.py in run(self, start, method, init, n_init, omit_offsets, **kwargs)
133 draws = kwargs.pop("draws", 1000)
134 with model:
--> 135 idata = pm.sample(
136 draws,
137 start=start,
~/venvs/data/lib/python3.9/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, jitter_max_retries, return_inferencedata, idata_kwargs, mp_ctx, pickle_backend, **kwargs)
557 _print_step_hierarchy(step)
558 try:
--> 559 trace = _mp_sample(**sample_args, **parallel_args)
560 except pickle.PickleError:
561 _log.warning("Could not pickle model, sampling singlethreaded.")
~/venvs/data/lib/python3.9/site-packages/pymc3/sampling.py in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, callback, discard_tuned_samples, mp_ctx, pickle_backend, **kwargs)
1475 try:
1476 with sampler:
-> 1477 for draw in sampler:
1478 trace = traces[draw.chain - chain]
1479 if trace.supports_sampler_stats and draw.stats is not None:
~/venvs/data/lib/python3.9/site-packages/pymc3/parallel_sampling.py in __iter__(self)
477
478 while self._active:
--> 479 draw = ProcessAdapter.recv_draw(self._active)
480 proc, is_last, draw, tuning, stats, warns = draw
481 self._total_draws += 1
~/venvs/data/lib/python3.9/site-packages/pymc3/parallel_sampling.py in recv_draw(processes, timeout)
357 else:
358 error = RuntimeError("Chain %s failed." % proc.chain)
--> 359 raise error from old_error
360 elif msg[0] == "writing_done":
361 proc._readable = True
RuntimeError: Chain 0 failed.
My next steps were to scale back the formula to something simpler:
model = Model("od ~ temp + 1", df)
results = model.fit(draws=2000, chains=2)
This failed, similarly. The error messages didn’t provide me with much information about what could have been wrong, and googling the error suggested a chain was falling into a bad region - okay.
I inspected my data, and nothing about it looked off. I inspected the priors by printing model, and they seemed a tighter than I was expecting.
Formula: od ~ temp + 1
Family name: Gaussian
Link: identity
Observations: 39
Priors:
Intercept ~ Normal(mu: 0.03708401, sigma: 0.01820133)
temp ~ Normal(mu: 0, sigma: 0.00049071)
sigma ~ HalfStudentT(nu: 4, sigma: 0.00307659)
I thought the problem may have been here for a while and so I tried widening the priors - no effect.
On a whim, I tried init="adapt_diag" in the fit as I saw this in other PyMC3 examples, and this worked. I was able to run both models now successfully. I guess the jitter was pushing my (tiny) priors into bad regions?
I’m lucky that I’m familiar with how Bayesian inference work in the backend, but I imagine other users, who are attracted to the high-level API of bambi, have less experience, and probably would have churned off the package quickly. My suggestion would be some docs on troubleshooting if fit fails, or even better: having bambi detect these problems and correct them auto-magically before inference is done.
Dataset used: obs.csv
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:5 (2 by maintainers)

Top Related StackOverflow Question
Thanks team,
A solution like #383 is kinda what I was hoping for: a low-level, under-the-covers check for problems and provided solution so users don’t need to go digging around discourse or ipynb docs. This would have likely completely solved the problem I was having. I’m sure others will benefit, too.
Feel free to open it again if you experience new problems any other problems with the sampler initialization