ValueError: Dataset does not contain the dimensions: {'y_mean_obs'}
See original GitHub issueOS: Linux Bambi: 0.7.1 Python: 3.8
Issue: I am receiving the above issues when using models and idata bambi objects when they are stored inside a dictionary or list. When they are not stored inside a collection, the expected behavior is observed. This is a recent issue since installing bambi 0.7.1 (my code worked previously).
function code ...
....
# Define model parameters
params = {
'family': 'bernoulli',
'chains': 3,
'draws': 10,
'tune': 10}
models = []
classifiers = []
for i in range(len(train_splits)):
print("\nComputing predictions for sampling run {}".format(i + 1))
x_train, y_train = train_splits[i]
x_test, y_test = test_splits[i]
# Run bambi model
x_train['y'] = y_train.values
# Get the function formula
f = get_formula(x_train.columns[:-1])
model = bmb.Model(f, x_train, family=params['family'])
clf = model.fit(draws=params['draws'], tune=params['tune'],
chains=params['chains'], init='auto')
models.append(model)
classifiers.append(clf)
# Run predictions
idata = model.predict(clf, data=x_test, inplace=False)
mean_preds = idata.posterior["y_mean"].values
predictions.append(mean_preds)
# Collect outputs
output_dict = {
'predictions': predictions,
'models': models,
'classifiers': classifiers,
'train_splits': train_splits,
'test_splits': test_splits
}
return output_dict
With a single iteration I get a model of:
print(output['models'][0])
output:
Formula: y ~ neighborhood_transferred + fusion + cooccurence + coexpression + coexpression_transferred + experiments + experiments_transferred + database + database_transferred + textmining + textmining_transferred
Family name: Bernoulli
Link: logit
Observations: 127151
Priors:
Common-level effects
Intercept ~ Normal(mu: 0, sigma: 6.6279)
neighborhood_transferred ~ Normal(mu: 0.0, sigma: 3.5808)
fusion ~ Normal(mu: 0.0, sigma: 2.596)
cooccurence ~ Normal(mu: 0.0, sigma: 3.6322)
coexpression ~ Normal(mu: 0.0, sigma: 2.9462)
coexpression_transferred ~ Normal(mu: 0.0, sigma: 2.9241)
experiments ~ Normal(mu: 0.0, sigma: 2.6692)
experiments_transferred ~ Normal(mu: 0.0, sigma: 2.8198)
database ~ Normal(mu: 0.0, sigma: 2.9285)
database_transferred ~ Normal(mu: 0.0, sigma: 2.5707)
textmining ~ Normal(mu: 0.0, sigma: 3.5179)
textmining_transferred ~ Normal(mu: 0.0, sigma: 3.7341)
And an Idata of:
print(output['classifiers'][0])

When trying to make a prediction on new data with the exact same column names as the training data: Running script with the following args:
output['models'][0].predict(idata=output['classifiers'][0], data=x, inplace=False)
output:
/mnt/mnemo5/sum02dean/sl_projects/handover/STRINGSCORE/src/scripts/nb.ipynb Cell 4' in <cell line: [1](vscode-notebook-cell://ssh-remote%2Blphobos/mnt/mnemo5/sum02dean/sl_projects/handover/STRINGSCORE/src/scripts/nb.ipynb#ch0000004vscode-remote?line=0)>()
----> 1[ output['models'][0].predict(idata=output['classifiers'][0], data=x, inplace=False)
File ~/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py:897, in Model.predict(self, idata, kind, data, draws, inplace)
]()[892](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=891)[ # 'linear_predictor' is of shape
]()[893](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=892)[ # * (chain_n, draw_n, obs_n) for univariate models
]()[894](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=893)[ # * (chain_n, draw_n, response_n, obs_n) for multivariate models
]()[896](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=895)[ if kind == "mean":
--> ]()[897](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=896)[ idata.posterior = self.family.predict(self, posterior, linear_predictor)
]()[898](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=897)[ else:
]()[899](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=898)[ pps_kwargs = {
]()[900](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=899)[ "model": self,
]()[901](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=900)[ "posterior": posterior,
(...)
]()[904](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=903)[ "draw_n": draw_n,
]()[905](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=904)[ }
File ~/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/families/univariate.py:18, in UnivariateFamily.predict(self, model, posterior, linear_predictor)
]()[16](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/families/univariate.py?line=15)[ # Drop var/dim if already present
]()[17](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/families/univariate.py?line=16)[ if name in posterior.data_vars:
---> ]()[18](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/families/univariate.py?line=17)[ posterior = posterior.drop_vars(name).drop_dims(coord_name)
]()[20](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/families/univariate.py?line=19)[ coords = ("chain", "draw", coord_name)
]()[21](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/families/univariate.py?line=20)[ posterior[name] = (coords, mean)
File ~/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/xarray/core/dataset.py:4602, in Dataset.drop_dims(self, drop_dims, errors)
]()[4600](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/xarray/core/dataset.py?line=4599)[ missing_dims = drop_dims - set(self.dims)
]()[4601](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/xarray/core/dataset.py?line=4600)[ if missing_dims:
-> ]()[4602](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/xarray/core/dataset.py?line=4601)[ raise ValueError(
]()[4603](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/xarray/core/dataset.py?line=4602)[ f"Dataset does not contain the dimensions: {missing_dims}"
]()[4604](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/xarray/core/dataset.py?line=4603)[ )
]()[4606](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/xarray/core/dataset.py?line=4605)[ drop_vars = {k for k, v in self._variables.items() if set(v.dims) & drop_dims}
]()[4607](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/xarray/core/dataset.py?line=4606)[ return self.drop_vars(drop_vars)
ValueError: Dataset does not contain the dimensions: {'y_mean_obs'}]()
Accessing the bambi specific objects via list indexing or dictionary lookup causes the issue. When not using an iterable or collection, the code works fine:
# Get the function formula
xt = output['train_splits'][0][0] # <--- train features
y = output['train_splits'][0][1] # <---- train labels
xt['y'] = y.values
f = get_formula(xt.columns[:-1])
print(f)
model = bmb.Model(f, xt, family='bernoulli')
clf = model.fit(draws=10, tune=10,
chains=3, init='auto')
model.predict(idata=clf, data=x, inplace=False)
print("ran without errors")
output:
y ~ neighborhood_transferred + fusion + cooccurence + coexpression + coexpression_transferred + experiments + experiments_transferred + database + database_transferred + textmining
Modeling the probability that y==1
Only 10 samples in chain.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (3 chains in 4 jobs)
NUTS: [textmining, database_transferred, database, experiments_transferred, experiments, coexpression_transferred, coexpression, cooccurence, fusion, neighborhood_transferred, Intercept]
100.00% [60/60 00:01<00:00 Sampling 3 chains, 0 divergences]
Sampling 3 chains for 10 tune and 10 draw iterations (30 + 30 draws total) took 2 seconds.
/mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/pymc3/sampling.py:643: UserWarning: The number of samples is too small to check convergence reliably.
warnings.warn("The number of samples is too small to check convergence reliably.")
Ran without errors
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:11
Top Results From Across the Web
Dataset.groupby reductions give "Dataset does not contain ...
Dataset.groupby reductions give "Dataset does not contain ... contain the dimensions: %s" % missing_dimensions ValueError: Dataset does not ...
Read more >ValueError: dimensions or multi-index levels ['lons', 'lats'] do ...
First, try extract the relevant DataArray from the DataSet, so tasmax=dataSet['tasmax'], and then try the .sel() command as you have it. If that ......
Read more >Applying unvectorized functions with apply_ufunc - Xarray
Our goal is to coveniently apply this function along a dimension of xarray objects that may or may not wrap dask arrays with...
Read more >xarray.combine_by_coords — xarray 0.12.2 documentation
If it cannot determine the order in which to concatenate the datasets, it will raise a ValueError. Non-coordinate dimensions will be ignored, as...
Read more >Source code for pyinterp.backends.xarray
Raises: ValueError: if the provided data array doesn't define a longitude/latitude axis if ``geodetic`` is True. ValueError: if the number of dimensions is...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thank you @tomicapretto this solved the issue for me. @aegonwolf I reinstalled from main as tomicapretto recommended. First by uninstalling bambi and then running:
pip install -U git+https://github.com/bambinos/bambi.git@main
Many thanks,
Dean
I think I found what’s going on. The development version has the following chunk
https://github.com/bambinos/bambi/blob/be8c622eb6530e1d9a5071dfa1b1e90aad40921e/bambi/families/univariate.py#L16-L21
while the version you’re using has
https://github.com/bambinos/bambi/blob/7d5a83f0bd8888a6c8136b01101548a9d23ef402/bambi/families/univariate.py#L16-L18
I’m sorry but I didn’t recall this fix.
Installing from the main branch in the repository should fix this issue.