Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Increasingly negative loss in variational autoencoder: is it normal?

See original GitHub issue

Hi, not sure if it’s an issue. I am training the variational autoencoder with a different set of images with 3 color channels. I am getting an increasingly negative loss. I wonder: is this a normal or valid outcome or is it a bug?

Shouldn’t the loss value be always a positive amount? I am worried because the search for a minimum of the loss function might not get to anything if the loss is not bounded by 0.

Building model and compiling functions...
L = 2, z_dim = 1, n_hid = 3, binary=True
Starting training...
Epoch 1 of 300 took 36.576s
  training loss:        1193603.765134
  validation loss:      358401.526396
Epoch 2 of 300 took 34.345s
  training loss:        170094.748865
  validation loss:      -990985.720292
Epoch 3 of 300 took 34.682s
  training loss:        -948598.243076
  validation loss:      -2374793.240720
Epoch 4 of 300 took 33.571s
  training loss:        -2179357.580108
  validation loss:      -3822347.805930
Epoch 5 of 300 took 36.031s
  training loss:        -3293897.853456
  validation loss:      -5299324.057571

I tried with 3 or 1024 hidden units, and z dimension being either 1 or 2, but the result is the same. Using the regular MNIST I have no issues: the loss value is positive and decreasing toward 0.

Issue Analytics

State:
Created 7 years ago
Reactions:4
Comments:7 (2 by maintainers)

Top GitHub Comments

30reactions

f0kcommented, Apr 5, 2016

Please have a closer look at which loss function you’re using. I guess that for MNIST, it will use binary cross-entropy, which requires values (both predictions and targets) to be between 0 and 1. You may want to try mean-squared error instead. Also take care of the network’s output nonlinearity, for MNIST it might be sigmoid, producing outputs in (0,1). You might also be successful with scaling the input data to be in [-1,1] and using tanh outputs (as they do in the DCGAN paper, for example).

3reactions

JMVaughancommented, Sep 14, 2018

This will do it,

std_dev = np.std(input)
mean = np.mean(input)
normalised_input = (input - mean) / std_dev

Subtracting the by mean has the effect of shifting the mean to be 0 of the resulting array, whilst dividing by std_dev scales the standard deviation (and variance) to be in the range 0-1.