RuntimeError: cholesky_cpu: U(63,63) is zero, singular U.
See original GitHub issueThis is very similar to #99 but with newer versions of everything. Again, trying to tune the hyperparameters of a neural net, and got this runtime error.
ax=0.1.11
botorch=0.2.0
gpytorch=1.1.1
torch=1.5.0
python=3.8.1
os=Ubuntu 18.04
I have a lot of data that might be enough to reproduce the failure with some effort, but am likely missing a few key things. When I turn it back on I’ll make sure I capture the random seed, and turn up the auto-save frequency.
Here’s the parameter definition I was using:
ax_range = [
{
"name": "dropout",
"type": "range",
"bounds": [0.0, 1.0],
"value_type": "float",
"log_scale": False,
},
{
"name": "num_layers",
"type": "range",
"bounds": [1, 6],
"value_type": "int",
"log_scale": False,
},
{
"name": "fc_dim",
"type": "range",
"bounds": [10, 1000],
"value_type": "int",
"log_scale": True,
},
{
"name": "lr",
"type": "range",
"bounds": [1e-5, 0.1],
"value_type": "float",
"log_scale": True,
},
]
I recorded most of the configurations & objectives here: singular-cholesky.csv.log, but lost the last few because they didn’t auto-save before the crash. The final few objectives (rounded) were:
run=5 trial=104 score=2.57 best_score=1.69
run=5 trial=105 score=1.78 best_score=1.69
run=5 trial=106 score=2.35 best_score=1.69
run=5 trial=107 score=1.82 best_score=1.69
run=5 trial=108 score=3.96 best_score=1.69
run=5 trial=109 score=2.61 best_score=1.69
run=5 trial=110 score=2.14 best_score=1.69
run=5 trial=111 score=1.71 best_score=1.69
run=5 trial=112 score=2.39 best_score=1.69
run=5 trial=113 score=2.00 best_score=1.69
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-9-df78cf250713> in <module>
...
full stack trace:
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/ax/service/ax_client.py in get_next_trial(self)
289 Tuple of trial parameterization, trial index
290 """
--> 291 trial = self.experiment.new_trial(generator_run=self._gen_new_generator_run())
292 logger.info(
293 f"Generated new trial {trial.index} with parameters "
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/ax/service/ax_client.py in _gen_new_generator_run(self, n)
928 # Filter out GPYTorch warnings to avoid confusing users.
929 warnings.simplefilter("ignore")
--> 930 return not_none(self.generation_strategy).gen(
931 experiment=self.experiment,
932 n=n,
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/ax/modelbridge/generation_strategy.py in gen(self, experiment, data, n, **kwargs)
339 )
340 model = not_none(self.model)
--> 341 generator_run = model.gen(
342 n=n,
343 **consolidate_kwargs(
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/ax/modelbridge/base.py in gen(self, n, search_space, optimization_config, pending_observations, fixed_features, model_gen_options)
608
609 # Apply terminal transform and gen
--> 610 observation_features, weights, best_obsf, gen_metadata = self._gen(
611 n=n,
612 search_space=search_space,
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/ax/modelbridge/array.py in _gen(self, n, search_space, pending_observations, fixed_features, model_gen_options, optimization_config)
200 )
201 # Generate the candidates
--> 202 X, w, gen_metadata = self._model_gen(
203 n=n,
204 bounds=bounds,
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/ax/modelbridge/torch.py in _model_gen(self, n, bounds, objective_weights, outcome_constraints, linear_constraints, fixed_features, pending_observations, model_gen_options, rounding_func, target_fidelities)
202 tensor_rounding_func = self._array_callable_to_tensor_callable(rounding_func)
203 # pyre-fixme[16]: `Optional` has no attribute `gen`.
--> 204 X, w, gen_metadata = self.model.gen(
205 n=n,
206 bounds=bounds,
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/ax/models/torch/botorch.py in gen(self, n, bounds, objective_weights, outcome_constraints, linear_constraints, fixed_features, pending_observations, model_gen_options, rounding_func, target_fidelities)
355 inequality_constraints = None
356
--> 357 acquisition_function = self.acqf_constructor( # pyre-ignore: [28]
358 model=model,
359 objective_weights=objective_weights,
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/ax/models/torch/botorch_defaults.py in get_NEI(model, objective_weights, outcome_constraints, X_observed, X_pending, **kwargs)
200 objective=obj_tf, constraints=con_tfs or [], infeasible_cost=inf_cost
201 )
--> 202 return get_acquisition_function(
203 acquisition_function_name="qNEI",
204 model=model,
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/botorch/acquisition/utils.py in get_acquisition_function(acquisition_function_name, model, objective, X_observed, X_pending, mc_samples, qmc, seed, **kwargs)
86 )
87 elif acquisition_function_name == "qNEI":
---> 88 return monte_carlo.qNoisyExpectedImprovement(
89 model=model,
90 X_baseline=X_observed,
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/botorch/acquisition/monte_carlo.py in __init__(self, model, X_baseline, sampler, objective, X_pending, prune_baseline)
216 )
217 if prune_baseline:
--> 218 X_baseline = prune_inferior_points(
219 model=model, X=X_baseline, objective=objective
220 )
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/botorch/acquisition/utils.py in prune_inferior_points(model, X, objective, num_samples, max_frac)
214 sampler = SobolQMCNormalSampler(num_samples=num_samples)
215 with torch.no_grad():
--> 216 posterior = model.posterior(X=X)
217 samples = sampler(posterior)
218 if objective is None:
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/botorch/models/gpytorch.py in posterior(self, X, output_indices, observation_noise, **kwargs)
300 X=X, original_batch_shape=self._input_batch_shape
301 )
--> 302 mvn = self(X)
303 if observation_noise is not False:
304 if torch.is_tensor(observation_noise):
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/gpytorch/models/exact_gp.py in __call__(self, *args, **kwargs)
326 # Make the prediction
327 with settings._use_eval_tolerance():
--> 328 predictive_mean, predictive_covar = self.prediction_strategy.exact_prediction(full_mean, full_covar)
329
330 # Reshape predictive mean to match the appropriate event shape
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/gpytorch/models/exact_prediction_strategies.py in exact_prediction(self, joint_mean, joint_covar)
300
301 return (
--> 302 self.exact_predictive_mean(test_mean, test_train_covar),
303 self.exact_predictive_covar(test_test_covar, test_train_covar),
304 )
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/gpytorch/models/exact_prediction_strategies.py in exact_predictive_mean(self, test_mean, test_train_covar)
318 # You **cannot* use addmv here, because test_train_covar may not actually be a non lazy tensor even for an exact
319 # GP, and using addmv requires you to delazify test_train_covar, which is obviously a huge no-no!
--> 320 res = (test_train_covar @ self.mean_cache.unsqueeze(-1)).squeeze(-1)
321 res = res + test_mean
322
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/gpytorch/utils/memoize.py in g(self, *args, **kwargs)
32 cache_name = name if name is not None else method
33 if not is_in_cache(self, cache_name):
---> 34 add_to_cache(self, cache_name, method(self, *args, **kwargs))
35 return get_from_cache(self, cache_name)
36
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/gpytorch/models/exact_prediction_strategies.py in mean_cache(self)
267
268 train_labels_offset = (self.train_labels - train_mean).unsqueeze(-1)
--> 269 mean_cache = train_train_covar.inv_matmul(train_labels_offset).squeeze(-1)
270
271 if settings.detach_test_caches.on():
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/gpytorch/lazy/lazy_tensor.py in inv_matmul(self, right_tensor, left_tensor)
932 func = InvMatmul
933 if left_tensor is None:
--> 934 return func.apply(self.representation_tree(), False, right_tensor, *self.representation())
935 else:
936 return func.apply(self.representation_tree(), True, left_tensor, right_tensor, *self.representation())
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/gpytorch/functions/_inv_matmul.py in forward(ctx, representation_tree, has_left, *args)
45 res = left_tensor @ res
46 else:
---> 47 solves = _solve(lazy_tsr, right_tensor)
48 res = solves
49
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/gpytorch/functions/_inv_matmul.py in _solve(lazy_tsr, rhs)
9 def _solve(lazy_tsr, rhs):
10 if settings.fast_computations.solves.off() or lazy_tsr.size(-1) <= settings.max_cholesky_size.value():
---> 11 return lazy_tsr._cholesky()._cholesky_solve(rhs)
12 else:
13 with torch.no_grad():
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/gpytorch/utils/memoize.py in g(self, *args, **kwargs)
32 cache_name = name if name is not None else method
33 if not is_in_cache(self, cache_name):
---> 34 add_to_cache(self, cache_name, method(self, *args, **kwargs))
35 return get_from_cache(self, cache_name)
36
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/gpytorch/lazy/lazy_tensor.py in _cholesky(self)
412
413 # contiguous call is necessary here
--> 414 cholesky = psd_safe_cholesky(evaluated_mat).contiguous()
415 return NonLazyTensor(cholesky)
416
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/gpytorch/utils/cholesky.py in psd_safe_cholesky(A, upper, out, jitter)
46 except RuntimeError:
47 continue
---> 48 raise e
~/anaconda3/envs/cvnlp/lib/python3.8/site-packages/gpytorch/utils/cholesky.py in psd_safe_cholesky(A, upper, out, jitter)
23 """
24 try:
---> 25 L = torch.cholesky(A, upper=upper, out=out)
26 return L
27 except RuntimeError as e:
RuntimeError: cholesky_cpu: U(63,63) is zero, singular U.
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (8 by maintainers)
Top Results From Across the Web
GPLVM RuntimeError: cholesky_cpu: U(29,29) is zero ...
Hi everyone, I'm a new user of Pyro and I need some help. I'm trying to make a dimensionality reduction using the algorithm...
Read more >Error Cholesky CPU - PyTorch Forums
RuntimeError : cholesky_cpu: U(135,135) is zero, singular U. It looks like this is a know bug – see the following two github issues:....
Read more >Pytorch torch.cholesky ignoring exception - Stack Overflow
For some matrices on my batch I'm having an exception due the matrix being singular. L = ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m of the opinion that numeric stability by itself isn’t worth a huge effort to fix when the underlying issue really comes down to operator error – or put another way, that moving forwards appropriately really requires operator attention. I think the most appropriate fix here might be a better error message saying something like “Perhaps one or more parameters has converged?” Automated diagnosis of this would be nice, but again seems like more effort than it’s worth. This github issue will help certainly help document it for future users, but a bit more of an explicit message would be a sufficient fix IMHO.
I’m not passing either for SEM - just passing a float:
so whatever the default is. My understanding from the docs is that this should cause the noise to be treated as a free parameter to optimize for. This problem should not be treated as noiseless.