question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Crash when metric is NaN

See original GitHub issue

Using the FastaiV2 callback, when the metric becomes NaN, the error is:

sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) NOT NULL constraint failed: trial_intermediate_values.intermediate_value [SQL: INSERT INTO trial_intermediate_values (trial_id, step, intermediate_value) VALUES (?, ?, ?)] [parameters: (52, 20, nan)]

My workaround is (inside FastAIV2PruningCallback):

    def after_epoch(self) -> None:
        super().after_epoch()
        # self.idx is set by TrackTrackerCallback

        out = self.recorder.final_record[self.idx]
        if np.isnan(out):
            out = np.inf if self.trial.study.direction == optuna.study.StudyDirection.MINIMIZE else -np.inf
        self.trial.report(out, step=self.epoch)

        if self.trial.should_prune():
            raise CancelFitException()

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7

github_iconTop GitHub Comments

3reactions
nzw0301commented, Jul 15, 2021

Thank you for your clarification! Indeed, I can reproduce the error by using the following simpler code

import optuna


def objective(trial):
    for i in range(10):
        trial.report(float('nan'), i)

    return float('nan')

pruner = optuna.pruners.MedianPruner()
study = optuna.create_study(direction="maximize", pruner=pruner,  storage="sqlite:///example.db")
study.optimize(objective, n_trials=10)

This might be related storage issue of Optuna, so I transfer this issue to the main repository.

1reaction
nzw0301commented, Aug 16, 2021

@lsc64 thank you for letting us know about it! Indeed, this issue is already known. Let me close this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

tf.keras giving nan loss and non validation error - Stack Overflow
So I'd say to try and replace the last activation from softmax to sigmoid and change the loss to binary_crossentropy . Also, how...
Read more >
Debugging a Machine Learning model written in TensorFlow ...
In this article, you get to look over my shoulder as I go about debugging a TensorFlow model. I did a lot of...
Read more >
Metric - All Comes Crashing (Official Video) - YouTube
Listen to "All Comes Crashing " from the forthcoming album Formentera: https://orcd.co/allcomescrashingTour Dates ::: Metric's The ...
Read more >
How Prometheus Monitoring works - YouTube
3) How does Prometheus collect those metrics from its targets? 4) Furthermore, I explain Prometheus Architecture with simple diagrams and ...
Read more >
Machine Learning Glossary - Google Developers
A/B testing usually compares a single metric on two techniques; ... For more details, see this tutorial in Machine Learning Crash Course.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found