add test for from_cmdstanpy to distinguish between vectors of length 1 and scalars
See original GitHub issueEDIT (@OriolAbril): See https://github.com/arviz-devs/arviz/issues/1646#issuecomment-1125464400 for an up to date description of the issue.
Describe the bug When fitting a model using cmdstanpy and reading it into an arviz.InferenceData object, I sometimes end up with a vector parameter with length 1.
When this happens, the io_cmdstanpy code incorrectly ignores the last dimension, causing an error when I provide coord/dims for that parameter.
To Reproduce I have created a reproducible example here:
# To add a new cell, type '# %%'
# To add a new markdown cell, type '# %% [markdown]'
# %%
import cmdstanpy
import arviz as az
# %% [markdown]
# # Review Stan model code
# %%
model_code = '''
// from https://arviz-devs.github.io/arviz/notebooks/InferenceDataCookbook.html#From-CmdStanPy
data {
int<lower=0> J;
real y[J];
real<lower=0> sigma[J];
int<lower = 0, upper=1> prior_only;
}
parameters {
real mu;
real<lower=0> tau;
real theta_tilde[J];
}
transformed parameters {
real theta[J];
for (j in 1:J)
theta[j] = mu + tau * theta_tilde[j];
}
model {
mu ~ normal(0, 5);
tau ~ cauchy(0, 5);
theta_tilde ~ normal(0, 1);
if (prior_only == 0) {
y ~ normal(theta, sigma);
}
}
generated quantities {
vector[J] log_lik;
vector[J] y_hat;
for (j in 1:J) {
log_lik[j] = normal_lpdf(y[j] | theta[j], sigma[j]);
y_hat[j] = normal_rng(theta[j], sigma[j]);
}
}
'''
# %% [markdown]
# # Fitting the model using cmdstanpy
# %%
stan_file = "eight_schools.stan"
with open(stan_file, 'x') as f:
f.write(model_code)
model = cmdstanpy.CmdStanModel(stan_file = stan_file)
# %%
stan_data8 = dict(
J=8,
y=[28, 8, -3, 7, -1, 1, 18, 12],
sigma=[15, 10, 16, 11, 9, 11, 10, 18],
prior_only=0,
)
stan_data1 = dict(
J=1,
y=[28],
sigma=[15],
prior_only=0,
)
coords8 = dict(school=["A", "B", "C", "D", "E", "F", "G", "H"])
coords1 = dict(school=["a"])
dims= dict(
theta=["school"],
y=["school"],
log_lik=["school"],
y_hat=["school"],
theta_tilde=["school"],
)
# %%
fit8 = model.sample(data=stan_data8)
fit1 = model.sample(data=stan_data1)
# %% [markdown]
# ## Construct arviz.InferenceData object
# %%
idata8 = az.from_cmdstanpy(fit8, coords = coords8, dims = dims,
posterior_predictive = ["y_hat"],
prior_predictive = ['y_hat'],
log_likelihood = ["log_lik"])
## fails!
idata1 = az.from_cmdstanpy(fit1, coords = coords1, dims = dims,
posterior_predictive = ["y_hat"],
prior_predictive = ['y_hat'],
log_likelihood = ["log_lik"])
The error looks like this:
/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/src/arviz/arviz/data/base.py:129: UserWarning: In variable theta_tilde, there are more dims (1) given than exist (0). Passed array should have shape (chain,draw, *shape)
warnings.warn(
Traceback (most recent call last):
File "/Users/jburos/projects/workflow2/docs/test_arviz_cmdstanpy.py", line 93, in <module>
idata1 = az.from_cmdstanpy(fit1, coords = coords1, dims = dims,
File "/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/src/arviz/arviz/data/io_cmdstanpy.py", line 763, in from_cmdstanpy
return CmdStanPyConverter(
File "/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/src/arviz/arviz/data/io_cmdstanpy.py", line 408, in to_inference_data
"posterior": self.posterior_to_xarray(),
File "/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/src/arviz/arviz/data/base.py", line 64, in wrapped
return func(cls)
File "/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/src/arviz/arviz/data/io_cmdstanpy.py", line 111, in posterior_to_xarray
dict_to_dataset(
File "/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/src/arviz/arviz/data/base.py", line 302, in dict_to_dataset
data_vars[key] = numpy_to_data_array(
File "/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/src/arviz/arviz/data/base.py", line 249, in numpy_to_data_array
return xr.DataArray(ary, coords=coords, dims=dims)
File "/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/lib/python3.8/site-packages/xarray/core/dataarray.py", line 409, in __init__
coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
File "/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/lib/python3.8/site-packages/xarray/core/dataarray.py", line 126, in _infer_coords_and_dims
raise ValueError(
ValueError: different number of dimensions on data and dims: 2 vs 3
deleting tmpfiles dir: /var/folders/18/st_852gs77d86c3srp0jgrnw0000gn/T/tmpfmrn2a4u
done
I think this is happening in this line: https://github.com/arviz-devs/arviz/blob/main/arviz/data/io_cmdstanpy.py#L594-L595
Expected behavior The from_cmdstanpy method should ideally know (from the dims) that this is a 1d vector or array, and label it as such.
Additional context
arviz version: 0.11.2
(github hash: 23e14fb645431014e73ba35f940b613850d59f30)
cmdstanpy version: 0.9.68
python version: 3.8.7 (default, Feb 3 2021, 06:31:03) \n[Clang 12.0.0 (clang-1200.0.32.29)]
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (8 by maintainers)
And we should probably test on length 0 arrays for sampling wrappers and reloo
There was probably some reason for the logic.
code should check if the variable is ‘scalar’ or nD object.
@mitzimorris @OriolAbril