question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ValueError: Dataset does not contain the dimensions: {'y_mean_obs'}

See original GitHub issue

OS: Linux Bambi: 0.7.1 Python: 3.8

Issue: I am receiving the above issues when using models and idata bambi objects when they are stored inside a dictionary or list. When they are not stored inside a collection, the expected behavior is observed. This is a recent issue since installing bambi 0.7.1 (my code worked previously).


function code ...
....

# Define model parameters
params = {
    'family': 'bernoulli',
    'chains': 3,
    'draws': 10,
    'tune': 10}

models = []
classifiers = []
for i in range(len(train_splits)):
        print("\nComputing predictions for sampling run {}".format(i + 1))
        x_train, y_train = train_splits[i]
        x_test, y_test = test_splits[i]

        # Run bambi model
        x_train['y'] = y_train.values

        # Get the function formula
        f = get_formula(x_train.columns[:-1])

        model = bmb.Model(f, x_train,  family=params['family'])
        clf = model.fit(draws=params['draws'], tune=params['tune'],
                        chains=params['chains'], init='auto')

        models.append(model)
        classifiers.append(clf)

        # Run predictions
        idata = model.predict(clf, data=x_test, inplace=False)
        mean_preds = idata.posterior["y_mean"].values
        predictions.append(mean_preds)

        # Collect outputs
        output_dict = {
            'predictions': predictions,
            'models': models,
            'classifiers': classifiers,
            'train_splits': train_splits,
            'test_splits': test_splits
        }
    return output_dict

With a single iteration I get a model of:

print(output['models'][0])

output:

Formula: y ~ neighborhood_transferred + fusion + cooccurence + coexpression + coexpression_transferred + experiments + experiments_transferred + database + database_transferred + textmining + textmining_transferred
Family name: Bernoulli
Link: logit
Observations: 127151
Priors:
  Common-level effects
    Intercept ~ Normal(mu: 0, sigma: 6.6279)
    neighborhood_transferred ~ Normal(mu: 0.0, sigma: 3.5808)
    fusion ~ Normal(mu: 0.0, sigma: 2.596)
    cooccurence ~ Normal(mu: 0.0, sigma: 3.6322)
    coexpression ~ Normal(mu: 0.0, sigma: 2.9462)
    coexpression_transferred ~ Normal(mu: 0.0, sigma: 2.9241)
    experiments ~ Normal(mu: 0.0, sigma: 2.6692)
    experiments_transferred ~ Normal(mu: 0.0, sigma: 2.8198)
    database ~ Normal(mu: 0.0, sigma: 2.9285)
    database_transferred ~ Normal(mu: 0.0, sigma: 2.5707)
    textmining ~ Normal(mu: 0.0, sigma: 3.5179)
    textmining_transferred ~ Normal(mu: 0.0, sigma: 3.7341)

And an Idata of:

print(output['classifiers'][0])
image

When trying to make a prediction on new data with the exact same column names as the training data: Running script with the following args:

output['models'][0].predict(idata=output['classifiers'][0], data=x, inplace=False)

output:

/mnt/mnemo5/sum02dean/sl_projects/handover/STRINGSCORE/src/scripts/nb.ipynb Cell 4' in <cell line: [1](vscode-notebook-cell://ssh-remote%2Blphobos/mnt/mnemo5/sum02dean/sl_projects/handover/STRINGSCORE/src/scripts/nb.ipynb#ch0000004vscode-remote?line=0)>()
----> 1[ output['models'][0].predict(idata=output['classifiers'][0], data=x, inplace=False)

File ~/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py:897, in Model.predict(self, idata, kind, data, draws, inplace)
    ]()[892](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=891)[ # 'linear_predictor' is of shape
    ]()[893](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=892)[ # * (chain_n, draw_n, obs_n) for univariate models
    ]()[894](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=893)[ # * (chain_n, draw_n, response_n, obs_n) for multivariate models
    ]()[896](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=895)[ if kind == "mean":
--> ]()[897](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=896)[     idata.posterior = self.family.predict(self, posterior, linear_predictor)
    ]()[898](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=897)[ else:
    ]()[899](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=898)[     pps_kwargs = {
    ]()[900](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=899)[         "model": self,
    ]()[901](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=900)[         "posterior": posterior,
   (...)
    ]()[904](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=903)[         "draw_n": draw_n,
    ]()[905](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/models.py?line=904)[     }

File ~/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/families/univariate.py:18, in UnivariateFamily.predict(self, model, posterior, linear_predictor)
     ]()[16](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/families/univariate.py?line=15)[ # Drop var/dim if already present
     ]()[17](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/families/univariate.py?line=16)[ if name in posterior.data_vars:
---> ]()[18](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/families/univariate.py?line=17)[     posterior = posterior.drop_vars(name).drop_dims(coord_name)
     ]()[20](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/families/univariate.py?line=19)[ coords = ("chain", "draw", coord_name)
     ]()[21](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/bambi/families/univariate.py?line=20)[ posterior[name] = (coords, mean)

File ~/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/xarray/core/dataset.py:4602, in Dataset.drop_dims(self, drop_dims, errors)
   ]()[4600](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/xarray/core/dataset.py?line=4599)[     missing_dims = drop_dims - set(self.dims)
   ]()[4601](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/xarray/core/dataset.py?line=4600)[     if missing_dims:
-> ]()[4602](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/xarray/core/dataset.py?line=4601)[         raise ValueError(
   ]()[4603](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/xarray/core/dataset.py?line=4602)[             f"Dataset does not contain the dimensions: {missing_dims}"
   ]()[4604](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/xarray/core/dataset.py?line=4603)[         )
   ]()[4606](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/xarray/core/dataset.py?line=4605)[ drop_vars = {k for k, v in self._variables.items() if set(v.dims) & drop_dims}
   ]()[4607](file:///mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/xarray/core/dataset.py?line=4606)[ return self.drop_vars(drop_vars)

ValueError: Dataset does not contain the dimensions: {'y_mean_obs'}]()

Accessing the bambi specific objects via list indexing or dictionary lookup causes the issue. When not using an iterable or collection, the code works fine:

# Get the function formula
xt = output['train_splits'][0][0] # <--- train features
y = output['train_splits'][0][1] #  <---- train labels
xt['y'] = y.values


f = get_formula(xt.columns[:-1])
print(f)

model = bmb.Model(f, xt, family='bernoulli')
clf = model.fit(draws=10, tune=10,
                chains=3, init='auto')

model.predict(idata=clf, data=x, inplace=False)
print("ran without errors")

output:

y ~ neighborhood_transferred + fusion + cooccurence + coexpression + coexpression_transferred + experiments + experiments_transferred + database + database_transferred + textmining
Modeling the probability that y==1
Only 10 samples in chain.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (3 chains in 4 jobs)
NUTS: [textmining, database_transferred, database, experiments_transferred, experiments, coexpression_transferred, coexpression, cooccurence, fusion, neighborhood_transferred, Intercept]

 100.00% [60/60 00:01<00:00 Sampling 3 chains, 0 divergences]
Sampling 3 chains for 10 tune and 10 draw iterations (30 + 30 draws total) took 2 seconds.
/mnt/mnemo5/sum02dean/miniconda3/envs/string-score-2.0/lib/python3.8/site-packages/pymc3/sampling.py:643: UserWarning: The number of samples is too small to check convergence reliably.
  warnings.warn("The number of samples is too small to check convergence reliably.")
Ran without errors

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:11

github_iconTop GitHub Comments

1reaction
Sum02deancommented, Apr 16, 2022

Thank you @tomicapretto this solved the issue for me. @aegonwolf I reinstalled from main as tomicapretto recommended. First by uninstalling bambi and then running: pip install -U git+https://github.com/bambinos/bambi.git@main

Many thanks,

Dean

1reaction
tomicaprettocommented, Apr 13, 2022

I think I found what’s going on. The development version has the following chunk

https://github.com/bambinos/bambi/blob/be8c622eb6530e1d9a5071dfa1b1e90aad40921e/bambi/families/univariate.py#L16-L21

while the version you’re using has

https://github.com/bambinos/bambi/blob/7d5a83f0bd8888a6c8136b01101548a9d23ef402/bambi/families/univariate.py#L16-L18

I’m sorry but I didn’t recall this fix.

Installing from the main branch in the repository should fix this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dataset.groupby reductions give "Dataset does not contain ...
Dataset.groupby reductions give "Dataset does not contain ... contain the dimensions: %s" % missing_dimensions ValueError: Dataset does not ...
Read more >
ValueError: dimensions or multi-index levels ['lons', 'lats'] do ...
First, try extract the relevant DataArray from the DataSet, so tasmax=dataSet['tasmax'], and then try the .sel() command as you have it. If that ......
Read more >
Applying unvectorized functions with apply_ufunc - Xarray
Our goal is to coveniently apply this function along a dimension of xarray objects that may or may not wrap dask arrays with...
Read more >
xarray.combine_by_coords — xarray 0.12.2 documentation
If it cannot determine the order in which to concatenate the datasets, it will raise a ValueError. Non-coordinate dimensions will be ignored, as...
Read more >
Source code for pyinterp.backends.xarray
Raises: ValueError: if the provided data array doesn't define a longitude/latitude axis if ``geodetic`` is True. ValueError: if the number of dimensions is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found