question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LinAlgError (Array must not contain infs or NaNs) thrown in get_mu_tensor

See original GitHub issue

Below is a simple piece of code to try YellowFin on my dataset.

x = tf.placeholder( tf.float32, [ None, train_x.shape[ 1 ] ] )
y = tf.placeholder( tf.float32, [ None, train_y.shape[ 1 ] ] )
m = tf.layers.dense( x, hidden_dim )
m = tf.layers.batch_normalization( m )
m = tf.nn.elu( m )
m = tf.layers.dense( m, hidden_dim )
m = tf.layers.batch_normalization( m )
m = tf.nn.elu( m )
m = tf.layers.dense( m, hidden_dim )
m = tf.layers.batch_normalization( m )
m = tf.nn.elu( m )
m = tf.layers.dense( m, train_y.shape[ 1 ] )
prediction = tf.nn.softmax( m )
loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits( labels=y, logits=m ) )
optimizer = yellowfin.YFOptimizer().minimize( loss )

s = tf.Session()
s.run( tf.global_variables_initializer() )
for epoch in range( epochs ):
    _, h = s.run( [ optimizer, loss ], feed_dict={ x: train_x, y: train_y } )

Usually, it crashes and throws the following exception.

Caused by op 'update_hyper/cond/PyFuncStateless', defined at:
  File "test2.py", line 47, in <module>
    optimizer = yf.YFOptimizer( learning_rate=1., momentum=0. ).minimize( loss )
  File "/data/python-mp-test/libs/yellowfin.py", line 268, in minimize
    return self.apply_gradients(grads_and_vars)
  File "/data/python-mp-test/libs/yellowfin.py", line 223, in apply_gradients
    update_hyper_op = self.update_hyper_param()
  File "/data/python-mp-test/libs/yellowfin.py", line 191, in update_hyper_param
    lambda: self._mu_var) )
  File "/usr/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
    return func(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1814, in cond
    orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1689, in BuildCondBranch
    original_result = fn()
  File "/data/python-mp-test/libs/yellowfin.py", line 190, in <lambda>
    self._mu = tf.identity(tf.cond(self._do_tune, lambda: self.get_mu_tensor(),
  File "/data/python-mp-test/libs/yellowfin.py", line 173, in get_mu_tensor
    roots = tf.py_func(np.roots, [coef], Tout=tf.complex64, stateful=False)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 201, in py_func
    input=inp, token=token, Tout=Tout, name=name)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/gen_script_ops.py", line 56, in _py_func_stateless
    Tout=Tout, name=name)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

UnknownError (see above for traceback): LinAlgError: Array must not contain infs or NaNs
	 [[Node: update_hyper/cond/PyFuncStateless = PyFuncStateless[Tin=[DT_FLOAT], Tout=[DT_COMPLEX64], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](update_hyper/cond/ScatterUpdate)]]

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
JianGoForItcommented, Jul 14, 2017

Hi @ywchan2005,

Thanks for trying out the optimizer. This is mostly because of the exploding gradient in the middle of training.

  1. If it happens in the very beginning, you might want to play with the initial value a bit.

  2. If it is in the middle of training, please consider using gradient clipping. There is discussion with solutions in our PyTorch YellowFin repo here. Similar solution can apply to the TF repo.

  3. We are working on a better auto gradient clipping feature. You may also wait for that in a few days. But I suggest you can already start working on 2.

0reactions
staticfloatcommented, Jan 27, 2018

I can confirm that I ran into this using a standard AlexNet architecture being trained on the ImageNet corpus using PyTorch. After 7 full epochs, (that is, having trained on 60928 minibatches, each of size 64) I received the following error:

yellowfin.py:192: RuntimeWarning: invalid value encountered in add
  self._grad_var += global_state['grad_norm_squared_avg'] / debias_factor
/var/storage/shared/msrlabs/sabae/libsmolder_autodeploy/libsmolder/optimizers/yellowfin.py:329: RuntimeWarning: invalid value encountered in double_scalars
  self._mu_t = max(root**2, ( (np.sqrt(dr) - 1) / (np.sqrt(dr) + 1) )**2 )

It would be really nice to have gradient clipping or some kind of workaround for this built-in to YellowFin. 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

LinAlgError: Array must not contain infs or NaNs, but no ...
Okay, I realized what I was doing wrong. The problem was the within_class_matrix method, which returns the following traceback:
Read more >
LinAlgError Array must not contain infs or NaNs · Issue #4291
I'm having some weird behavior. In my local environment the same code works as expected, and in my prod environment throws this error: ......
Read more >
numpy.linalg.LinAlgError: Array must not contain infs or ...
Hi All, I searched the following error but no solution so I posted it as a separate topic. Any suggestion? Thanks.
Read more >
As Function Inputs - ValueError: array must not contain infs ...
Multiple Latent Gaussian Processes - As Function Inputs - ValueError: array must not contain infs or NaNs ... I am trying to fit...
Read more >
PCA scikit-learn - ValueError: array must not contain infs or ...
The numpy array shape is (512, 48), dtype is float64. ... The array does not contain infs or NaNs but I get an...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found