Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

add test for from_cmdstanpy to distinguish between vectors of length 1 and scalars

See original GitHub issue

EDIT (@OriolAbril): See https://github.com/arviz-devs/arviz/issues/1646#issuecomment-1125464400 for an up to date description of the issue.

Describe the bug When fitting a model using cmdstanpy and reading it into an arviz.InferenceData object, I sometimes end up with a vector parameter with length 1.

When this happens, the io_cmdstanpy code incorrectly ignores the last dimension, causing an error when I provide coord/dims for that parameter.

To Reproduce I have created a reproducible example here:

# To add a new cell, type '# %%'
# To add a new markdown cell, type '# %% [markdown]'
# %%

import cmdstanpy
import arviz as az

# %% [markdown]
# # Review Stan model code

# %%
model_code = '''
// from https://arviz-devs.github.io/arviz/notebooks/InferenceDataCookbook.html#From-CmdStanPy
data {
    int<lower=0> J;
    real y[J];
    real<lower=0> sigma[J];
    int<lower = 0, upper=1> prior_only;
}
parameters {
    real mu;
    real<lower=0> tau;
    real theta_tilde[J];
}
transformed parameters {
    real theta[J];
    for (j in 1:J)
        theta[j] = mu + tau * theta_tilde[j];
}
model {
    mu ~ normal(0, 5);
    tau ~ cauchy(0, 5);
    theta_tilde ~ normal(0, 1);
    if (prior_only == 0) {
        y ~ normal(theta, sigma);
    }
}
generated quantities {
    vector[J] log_lik;
    vector[J] y_hat;
    for (j in 1:J) {
        log_lik[j] = normal_lpdf(y[j] | theta[j], sigma[j]);
        y_hat[j] = normal_rng(theta[j], sigma[j]);
    }
}
'''

# %% [markdown]
# # Fitting the model using cmdstanpy

# %%
stan_file = "eight_schools.stan"
with open(stan_file, 'x') as f:
    f.write(model_code)
model = cmdstanpy.CmdStanModel(stan_file = stan_file)


# %%
stan_data8 = dict(
            J=8,
            y=[28, 8, -3, 7, -1, 1, 18, 12],
            sigma=[15, 10, 16, 11, 9, 11, 10, 18],
            prior_only=0,
        )
stan_data1 = dict(
            J=1,
            y=[28],
            sigma=[15],
            prior_only=0,
        )
coords8 = dict(school=["A", "B", "C", "D", "E", "F", "G", "H"])
coords1 = dict(school=["a"])
dims= dict(
        theta=["school"],
        y=["school"],
        log_lik=["school"],
        y_hat=["school"],
        theta_tilde=["school"],
    )
# %%
fit8 = model.sample(data=stan_data8)
fit1 = model.sample(data=stan_data1)

# %% [markdown]
# ## Construct arviz.InferenceData object

# %%
idata8 = az.from_cmdstanpy(fit8, coords = coords8, dims = dims,
    posterior_predictive = ["y_hat"],
    prior_predictive = ['y_hat'],
    log_likelihood = ["log_lik"])

## fails!
idata1 = az.from_cmdstanpy(fit1, coords = coords1, dims = dims,
    posterior_predictive = ["y_hat"],
    prior_predictive = ['y_hat'],
    log_likelihood = ["log_lik"])

The error looks like this:

/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/src/arviz/arviz/data/base.py:129: UserWarning: In variable theta_tilde, there are more dims (1) given than exist (0). Passed array should have shape (chain,draw, *shape)
  warnings.warn(
Traceback (most recent call last):
  File "/Users/jburos/projects/workflow2/docs/test_arviz_cmdstanpy.py", line 93, in <module>
    idata1 = az.from_cmdstanpy(fit1, coords = coords1, dims = dims,
  File "/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/src/arviz/arviz/data/io_cmdstanpy.py", line 763, in from_cmdstanpy
    return CmdStanPyConverter(
  File "/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/src/arviz/arviz/data/io_cmdstanpy.py", line 408, in to_inference_data
    "posterior": self.posterior_to_xarray(),
  File "/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/src/arviz/arviz/data/base.py", line 64, in wrapped
    return func(cls)
  File "/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/src/arviz/arviz/data/io_cmdstanpy.py", line 111, in posterior_to_xarray
    dict_to_dataset(
  File "/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/src/arviz/arviz/data/base.py", line 302, in dict_to_dataset
    data_vars[key] = numpy_to_data_array(
  File "/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/src/arviz/arviz/data/base.py", line 249, in numpy_to_data_array
    return xr.DataArray(ary, coords=coords, dims=dims)
  File "/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/lib/python3.8/site-packages/xarray/core/dataarray.py", line 409, in __init__
    coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
  File "/Users/jburos/.local/share/virtualenvs/workflow2-PgZLfFHB/lib/python3.8/site-packages/xarray/core/dataarray.py", line 126, in _infer_coords_and_dims
    raise ValueError(
ValueError: different number of dimensions on data and dims: 2 vs 3
deleting tmpfiles dir: /var/folders/18/st_852gs77d86c3srp0jgrnw0000gn/T/tmpfmrn2a4u
done

I think this is happening in this line: https://github.com/arviz-devs/arviz/blob/main/arviz/data/io_cmdstanpy.py#L594-L595

Expected behavior The from_cmdstanpy method should ideally know (from the dims) that this is a 1d vector or array, and label it as such.

Additional context

arviz version: 0.11.2 (github hash: 23e14fb645431014e73ba35f940b613850d59f30) cmdstanpy version: 0.9.68 python version: 3.8.7 (default, Feb 3 2021, 06:31:03) \n[Clang 12.0.0 (clang-1200.0.32.29)]

Issue Analytics

State:
Created 2 years ago
Comments:10 (8 by maintainers)

Top GitHub Comments

1reaction

OriolAbrilcommented, Mar 30, 2021

And we should probably test on length 0 arrays for sampling wrappers and reloo

1reaction

ahartikainencommented, Mar 30, 2021

There was probably some reason for the logic.

code should check if the variable is ‘scalar’ or nD object.

@mitzimorris @OriolAbril

Top Results From Across the Web

2.1 Scalars and Vectors | University Physics Volume 1

Describe the difference between vector and scalar quantities. Identify the magnitude and direction of a vector. Explain the effect of multiplying a vector...

Adding and Subtracting Vectors - Varsity Tutors

To add or subtract two vectors, add or subtract the corresponding components. Let →u=⟨u1,u2⟩ and →v=⟨v1,v2⟩ be two vectors. Then, the sum of...

The scalar product - Mathcentre

When we calculate the scalar product of two vectors the result, as the name suggests is a scalar, rather than a vector. In...

The difference between Vectors and Scalars, Introduction and ...

This video introduces the difference between scalars and vectors. Ideas about magnitude and direction are introduced and examples of both ...

Learn how to visualize the difference of two vectors - YouTube

Learn how to determine the resultant vector by adding, subtracting and multiplying vectors by a scalar. We will also learn how to graph...