Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Getting nan for custom metric while training

See original GitHub issue

I implemented Mean Average Precision (MAP@all) in tensorflow like this:

def mean_avg_prec_tf(y_true, y_pred):
    dims = tf.shape(y_true)
    n = dims[0]
    k = dims[1]

    _, top_idx = tf.nn.top_k(y_pred, k)

    y_true = tf.to_float(y_true)
    top_idx = tf.to_float(top_idx)

    label_idx = tf.concat(1, [y_true, top_idx])
    label_idx = tf.reshape(label_idx, [n, 2, k])

    def avg_prec(label_idx):
        label = label_idx[0]
        idx = label_idx[1]
        idx = tf.to_int32(idx)
        ordered_pred = tf.gather(label, idx)
        prec = ordered_pred * tf.cumsum(ordered_pred)
        prec /= tf.to_float(tf.range(1, k + 1))
        prec = tf.reduce_sum(prec) / tf.reduce_sum(ordered_pred)
        return prec

    precs = tf.map_fn(avg_prec, label_idx)
    return tf.reduce_mean(precs)

This gives me a nan on training set during training but the correct value for the validation set. Any idea how I can fix this?

Issue Analytics

State:
Created 7 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

2reactions

joelthchaocommented, Jan 20, 2017

you can use K.epsilon() for 1e-12

1reaction

Nilabhracommented, Jan 19, 2017

Added a buffer to the denomicator:

def mean_avg_prec_tf(y_true, y_pred):
    dims = tf.shape(y_true)
    n = dims[0]
    k = dims[1]

    _, top_idx = tf.nn.top_k(y_pred, k)

    y_true = tf.to_float(y_true)
    top_idx = tf.to_float(top_idx)

    label_idx = tf.concat(1, [y_true, top_idx])
    label_idx = tf.reshape(label_idx, [n, 2, k])

    def avg_prec(label_idx):
        label = label_idx[0]
        idx = label_idx[1]
        idx = tf.to_int32(idx)
        ordered_pred = tf.gather(label, idx)
        prec = ordered_pred * tf.cumsum(ordered_pred)
        prec /= tf.to_float(tf.range(1, k + 1))
        s = tf.reduce_sum(ordered_pred) + 1e-12
        prec = tf.reduce_sum(prec) / s
        return prec

    precs = tf.map_fn(avg_prec, label_idx)
    return tf.reduce_sum(precs) / (tf.to_float(tf.count_nonzero(precs)) + 1e-12)

This is working for now. Not sure if there is a clever way to solve this.

Closing the issue. Thanks for the help.

Top Results From Across the Web

Custom metric Turns to NaN after many steps in each epoch

As I begin an epoch, I get values printing out for the metrics but after many steps one metrics returns NaN and a...

Capturing a Training State in TensorFlow | by Chaim Rand

We customize the train step to test for NaN gradients before applying them to the model weights. If a NaN gradient is discovered...

How to Use Metrics for Deep Learning with Keras in Python

How Keras metrics work and how you can use them when training your ... You can get an idea of how to write...

Callbacks - Keras 2.0.6. Documentation

Callback that accumulates epoch averages of metrics. This callback is automatically ... Callback that terminates training when a NaN loss is encountered.

Python tips and tricks - 7: Continuing keras model ... - YouTube

Loading a keras model and continuing training When using custom loss function and metrics .No code to share with this video.