question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to solve 'Input is not Invertible error'?

See original GitHub issue

I am trying to train a GLOW mapping on a custom dataset. However while training, I frequently receive a tensorflow.python.framework.errors_impl.InvalidArgumentError: Input is not invertible error. Upon seeing the logs, I see that the training/validation stats have reached either inf or nan.

I then tried to just reproduce your results for celeba 256x256 Qualitatively. However, I still face such issues. I am lost as to how to debug. I downloaded the celeba-tfr dataset locally.

Command:

python train.py --problem celeba --image_size 256 --n_level 6 --depth 32 --flow_permutation 2 --flow_coupling 0 --seed 0 --learntop --lr 0.001 --n_bits_x 5 --data_dir=./celeba-tfr --verbose --epochs_full_valid=1 --epochs_full_sample=1 --n_train=30 --n_test=30

Namespace:

Namespace(anchor_size=32, beta1=0.9, category='', dal=1, data_dir='./celeba-tfr', depth=32, direct_it
erator=True, epochs=1000000, epochs_full_sample=1, epochs_full_valid=1, epochs_warmup=10, flow_coupli
ng=0, flow_permutation=2, fmap=1, full_test_its=30, gradient_checkpointing=1, image_size=256, inferen
ce=False, learntop=True, local_batch_init=4, local_batch_test=1, local_batch_train=1, logdir='./logs'
, lr=0.001, n_batch_init=256, n_batch_test=50, n_batch_train=64, n_bins=32.0, n_bits_x=5, n_levels=6,
 n_sample=1, n_test=30, n_train=30, n_y=1, optimizer='adamax', pmap=16, polyak_epochs=1, problem='cel
eba', restore_path='', rnd_crop=False, seed=0, test_its=1, top_shape=[4, 4, 384], train_its=1, verbos
e=True, weight_decay=1.0, weight_y=0.0, width=512, ycond=False)

Trace:

Starting training. Logging to /home/ubuntu/glow_/logs/
epoch n_processed n_images ips dtrain dtest dsample dtot train_results test_results msg
0 179.9140625 [2.5411766 2.5411766 0.        1.       ]
1 64 1 0.0 179.9 88.8 177.1 445.7 [2.5411766 2.5411766 0.        1.       ] [2.7737396 2.7737396 0.
      1.       ]  *
64 5.25806736946106 [2.6743338 2.6743338 0.        1.       ]
2 128 2 0.2 5.3 36.1 161.6 203.0 [2.6743338 2.6743338 0.        1.       ] [nan nan  0.  1.]
128 4.962073087692261 [nan nan  0.  1.]
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/clie
nt/session.py", line 1322, in _do_call
    return fn(*args)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/clie
nt/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/clie
nt/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input is not invertible.
         [[Node: model_3/1/28/invconv/MatrixInverse = MatrixInverse[T=DT_FLOAT, adjoint=false, _devic
e="/job:localhost/replica:0/task:0/device:GPU:0"](model/1/28/invconv/W/read)]]
         [[Node: model_3/5/6/f1/l_1/Shape/_79621 = _Recv[client_terminated=false, recv_device="/job:l
ocalhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0",
send_device_incarnation=1, tensor_name="edge_9830_model_3/5/6/f1/l_1/Shape", tensor_type=DT_INT32, _d
evice="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 414, in <module>
    main(hps)
  File "train.py", line 163, in main
    train(sess, model, hps, logdir, visualise)
  File "train.py", line 274, in train
    visualise(epoch)
  File "train.py", line 50, in draw_samples
    x_samples.append(sample_batch(y, [.0]*n_batch))
  File "train.py", line 33, in sample_batch
    y[i*n_batch:i*n_batch + n_batch], eps[i*n_batch:i*n_batch + n_batch]))
  File "/home/ubuntu/glow_/model.py", line 242, in sample
    return m.sess.run(x_sampled, {Y: _y, m.eps_std: _eps_std})
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input is not invertible.
         [[Node: model_3/1/28/invconv/MatrixInverse = MatrixInverse[T=DT_FLOAT, adjoint=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model/1/28/invconv/W/read)]]
         [[Node: model_3/5/6/f1/l_1/Shape/_79621 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_9830_model_3/5/6/f1/l_1/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'model_3/1/28/invconv/MatrixInverse', defined at:
  File "train.py", line 414, in <module>
    main(hps)
  File "train.py", line 156, in main
    model = model.model(sess, hps, train_iterator, test_iterator, data_init)
  File "/home/ubuntu/glow_/model.py", line 239, in model
    x_sampled = f_sample(Y, m.eps_std)
  File "/home/ubuntu/glow_/model.py", line 232, in f_sample
    z = decoder(z, eps_std=eps_std)
  File "/home/ubuntu/glow_/model.py", line 97, in decoder
    z, _ = revnet2d(str(i), z, 0, hps, reverse=True)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
    return func(*args, **current_args)
  File "/home/ubuntu/glow_/model.py", line 342, in revnet2d
    z, logdet = revnet2d_step(str(i), z, logdet, hps, reverse)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
    return func(*args, **current_args)
  File "/home/ubuntu/glow_/model.py", line 411, in revnet2d_step
    "invconv", z, logdet, reverse=True)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
    return func(*args, **current_args)
  File "/home/ubuntu/glow_/model.py", line 467, in invertible_1x1_conv
    _w = tf.matrix_inverse(w)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/ops/gen_linalg_ops.py", line 1049, in matrix_inverse
    "MatrixInverse", input=input, adjoint=adjoint, name=name)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Input is not invertible.
         [[Node: model_3/1/28/invconv/MatrixInverse = MatrixInverse[T=DT_FLOAT, adjoint=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model/1/28/invconv/W/read)]]
         [[Node: model_3/5/6/f1/l_1/Shape/_79621 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_9830_model_3/5/6/f1/l_1/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

I suspected it is because of bad learning rates which might make the kernel non-invertible, I played with low LRs, but of no help.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:3
  • Comments:10

github_iconTop GitHub Comments

7reactions
tatsuhiko-inouecommented, Aug 21, 2018

I also experienced a similar error. I avoided the error using the following modification.

diff --git a/model.py b/model.py
index b918ab0..68cb3fe 100644
--- a/model.py
+++ b/model.py
@@ -373,7 +373,7 @@ def revnet2d_step(name, z, logdet, hps, reverse):
                 h = f("f1", z1, hps.width, n_z)
                 shift = h[:, :, :, 0::2]
                 # scale = tf.exp(h[:, :, :, 1::2])
-                scale = tf.nn.sigmoid(h[:, :, :, 1::2] + 2.)
+                scale = tf.nn.sigmoid(h[:, :, :, 1::2] + 2.) + 1e-10
                 z2 += shift
                 z2 *= scale
                 logdet += tf.reduce_sum(tf.log(scale), axis=[1, 2, 3])
@@ -393,7 +393,7 @@ def revnet2d_step(name, z, logdet, hps, reverse):
                 h = f("f1", z1, hps.width, n_z)
                 shift = h[:, :, :, 0::2]
                 # scale = tf.exp(h[:, :, :, 1::2])
-                scale = tf.nn.sigmoid(h[:, :, :, 1::2] + 2.)
+                scale = tf.nn.sigmoid(h[:, :, :, 1::2] + 2.) + 1e-10
                 z2 /= scale
                 z2 -= shift
                 logdet -= tf.reduce_sum(tf.log(scale), axis=[1, 2, 3])
diff --git a/tfops.py b/tfops.py
index d978419..2e7c556 100644
--- a/tfops.py
+++ b/tfops.py
@@ -449,9 +449,9 @@ def gaussian_diag(mean, logsd):
     o.sample = mean + tf.exp(logsd) * o.eps
     o.sample2 = lambda eps: mean + tf.exp(logsd) * eps
     o.logps = lambda x: -0.5 * \
-        (np.log(2 * np.pi) + 2. * logsd + (x - mean) ** 2 / tf.exp(2. * logsd))
+        (np.log(2 * np.pi) + 2. * logsd + (x - mean) ** 2 / (tf.exp(2. * logsd) + 1e-10))
     o.logp = lambda x: flatten_sum(o.logps(x))
-    o.get_eps = lambda x: (x - mean) / tf.exp(logsd)
+    o.get_eps = lambda x: (x - mean) / (tf.exp(logsd) + 1e-10)
     return o
1reaction
paulchou0309commented, Jan 21, 2019

I met the issue same how to solve it?@tatsuhiko-inoue @nuges01 @arunpatro

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tensorflow not able to invert matrix - Stack Overflow
I think the reason is that your matrix is not invertible. One option is to use Moore-Penrose matrix inversion which is supported by...
Read more >
InvalidArgumentError: Input matrix is not invertible
When I am running. “model.fit(x)”. it is throwing the following error… … InvalidArgumentError: Input matrix is not invertible.
Read more >
Invertible matrix - Wikipedia
Non -square matrices (m-by-n matrices for which m ≠ n) do not have an inverse. However, in some cases such a matrix may...
Read more >
What to Do When Your Hessian Is Not Invertible
Our suggestion is to perform two diagnostics to detect these problems and to alter the reported standard errors or covariances accordingly. For small...
Read more >
Multivariate normal problem specifying sigma - greta forum
Error in py_call_impl(callable, dots$args, dots$keywords): InvalidArgumentError: Input matrix is not invertible.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found