Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SoftmaxLikelihood and building multiclass classification model

See original GitHub issue

Hi,

My issue is very similar to #994 by @cherepanovic, but I still don’t understand multiclass classification after reading #994. I’m opening a new issue not to clutter #994 with my question.

I am trying to build a multiclass classification model using the SoftmaxLikelihood. However, I am not sure what the arguments of this likelihood function mean (the documentation on this function is very succint), and I haven’t been able to figure it out by myself because I can’t reproduce the only example I’ve found (SVDKL on CIFAR). Without any modification, I get the error

---> 50 from densenet import DenseNet
     51
     52 class DenseNetFeatureExtractor(DenseNet):

ModuleNotFoundError: No module named 'densenet'

If I define DenseNet as in your file densenet.py, I obtain the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-8-aa721b7a7db0> in <module>
    177 for epoch in range(1, n_epochs + 1):
    178     with gpytorch.settings.use_toeplitz(False):
--> 179         train(epoch)
    180         test()
    181     scheduler.step()

<ipython-input-8-aa721b7a7db0> in train(epoch)
    153             optimizer.zero_grad()
    154             output = model(data)
--> 155             loss = -mll(output, target)
    156             loss.backward()
    157             optimizer.step()

~/housekeeping/virtualenv/ml2/lib/python3.6/site-packages/gpytorch/module.py in __call__(self, *inputs, **kwargs)
     22
     23     def __call__(self, *inputs, **kwargs):
---> 24         outputs = self.forward(*inputs, **kwargs)
     25         if isinstance(outputs, list):
     26             return [_validate_module_outputs(output) for output in outputs]

~/housekeeping/virtualenv/ml2/lib/python3.6/site-packages/gpytorch/mlls/variational_elbo.py in forward(self, variational_dist_f, target, **kwargs)
     75         :return: Variational ELBO. Output shape corresponds to batch shape of the model/input data.
     76         """
---> 77         return super().forward(variational_dist_f, target, **kwargs)

~/housekeeping/virtualenv/ml2/lib/python3.6/site-packages/gpytorch/mlls/_approximate_mll.py in forward(self, approximate_dist_f, target, **kwargs)
     55         # Get likelihood term and KL term
     56         num_batch = approximate_dist_f.event_shape.numel()
---> 57         log_likelihood = self._log_likelihood_term(approximate_dist_f, target, **kwargs).div(num_batch)
     58         kl_divergence = self.model.variational_strategy.kl_divergence().div(self.num_data / self.beta)
     59

~/housekeeping/virtualenv/ml2/lib/python3.6/site-packages/gpytorch/mlls/variational_elbo.py in _log_likelihood_term(self, variational_dist_f, target, **kwargs)
     59
     60     def _log_likelihood_term(self, variational_dist_f, target, **kwargs):
---> 61         return self.likelihood.expected_log_prob(target, variational_dist_f, **kwargs).sum(-1)
     62
     63     def forward(self, variational_dist_f, target, **kwargs):

~/housekeeping/virtualenv/ml2/lib/python3.6/site-packages/gpytorch/likelihoods/likelihood.py in expected_log_prob(self, observations, function_dist, *args, **kwargs)
     37
     38     def expected_log_prob(self, observations, function_dist, *args, **kwargs):
---> 39         likelihood_samples = self._draw_likelihood_samples(function_dist, *args, **kwargs)
     40         res = likelihood_samples.log_prob(observations).mean(dim=0)
     41         return res

~/housekeeping/virtualenv/ml2/lib/python3.6/site-packages/gpytorch/likelihoods/likelihood.py in _draw_likelihood_samples(self, function_dist, sample_shape, *args, **kwargs)
     34             function_dist = base_distributions.Independent(function_dist, num_event_dims - 1)
     35         function_samples = function_dist.rsample(sample_shape)
---> 36         return self.forward(function_samples, *args, **kwargs)
     37
     38     def expected_log_prob(self, observations, function_dist, *args, **kwargs):

~/housekeeping/virtualenv/ml2/lib/python3.6/site-packages/gpytorch/likelihoods/softmax_likelihood.py in forward(self, function_samples, *params, **kwargs)
     36         num_features, num_data = function_samples.shape[-2:]
     37         if num_features != self.num_features:
---> 38             raise RuntimeError("There should be %d features" % self.num_features)
     39
     40         if self.mixing_weights is not None:

RuntimeError: There should be 132 features

I may well be making a mistake, but I am copying and pasting everything in the example, without modifying any code myself.

My questions are:

What does the argument num_features to SoftmaxLikelihood mean? Is this the number of features of the inputs (e.g. if the input are MNIST images, is num_features 28*28=784? If so, why does the likelihood function need access to the number of features of the inputs, if it doesn’t deal with the inputs themselves?
What does the argument num_classes to SoftmaxLikelihood mean? I’m assuming it is just the number of classes? E.g. in MNIST num_classes would be 10?
What are the mixing weights in a softmax function? The softmax that I know doesn’t need any weights, but simply normalizes the logits by exponentiating them and dividing them by the sum of all the exponentiated logits.
What does the argument to num_tasks to MultitaskVariationalStrategy? This also has to be the number of classes, right?
What are the appropriate target dimensions when there are multiple classes? Should they be:
- One-dimensional, with number of elements num_samples, and each target represented by a number? (e.g. 0-9 in MNIST)?
- Two-dimensional, same as before, but with vertical vector, so dimensions are num_samples x 1?
- Two-dimensional, but with a one-hot encoding, so the dimensions would be num_samples x num_classes?

Thanks a lot in advance.

Issue Analytics

State:
Created 4 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

6reactions

gpleisscommented, Dec 22, 2019

In the SVDKL example, the output from the model (output = model(data)) has 132 dimensions, even though there are only 10 classes in CIFAR-10.

See equation 1 of the SVDKL paper. The output of the GP is a multi-output multivariate normal distribution of n x f, where f is num_features. The likelihood uses a linear mixing parameter (A in equation 1) to reduce this f dimensional output into a c dimensional output, where c is the number of classes num_classes.

num_features refers to f=132: the number of (independent) features that are output from the GP. If you look at the output of the GP layer, it will be a MultitaskMultivariateNormal distribution with event shape n x 132.
num_classes is c, the number of classification classes (e.g. 10 for mnist)
Again, see equation 1 of the paper
num_tasks here refers to f, the number of output dimension in the GP (so 132).
You are correct - the output of the likelihood is a Categorical distribution, which is represented by a c dimensional vector (one output for each class).

For more details, please refer to the SVDKL paper

1reaction

gpleisscommented, Dec 26, 2019

n is the number of data, and it shouldn’t influence the number of neural network features or the dimensionality of the data. f refers to the function that the GP learns - it is not a counting variable.

Q = J in our case - i.e. there is one Gaussian process for each neural network feature. (According to the experiments and the authors of the paper - this is the best setup to use).