Unexpected behavior when using `pm.hpd` with multidimensional (ndim>2)posterior arrays
See original GitHub issueDescribe the bug
I am using hpd
on multi-dimensional posteriors with larger than 2 dimensions. One example of where this can come up is if I use a 2D array to represent the coefficients for the interaction between levels of 2 different predictors. The output trace of this 2D array of coefficients from PyMC3 will now be 3D, with the 0th dimension representing the MCMC samples.
I find that with a 3D trace/array, hpd
treats dimension 1 as the MCMC dimension (instead of dimension 0, the actual MCMC dimension)
To Reproduce
#12000 MCMC samples from a 10x3 dimensional distribution
a = np.random.normal(size=(12000, 10, 3))
pm.hpd(a).shape
Output: (3, 12000, 2) Should instead be: (10, 3, 2)
This happens because the assumption in hpd
is that ndim==2
if ndim>1
: https://github.com/arviz-devs/arviz/blob/master/arviz/stats/stats.py#L346
for row in ary.T
works properly for arrays that have 2 dimensions, but for anything larger than that, it infers the wrong MCMC dimension.
Suggestion: Will it make sense to have an axis
parameter to specify the axis of the MCMC samples, much like say np.mean
etc?
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (5 by maintainers)
I’ll like to work on this.
@AlexAndorra Yeah, I used the concatenation and transposition idea that you outlined for my use-case, but wondering if its good to have some sort of general structure in arviz that clears up the confusion that might be caused by multiple array dimensions.