Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Metrics with multiple outputs & standardizing outputs

See original GitHub issue

The current implementation of metrics assumes a very specific set of return values from the training/validation function: (y_pred, y), as can be seen for example in https://github.com/pytorch/ignite/blob/master/ignite/metrics/mean_absolute_error.py#L20 .

This has a couple inconvenient consequences:

Existing metrics cannot handle a model with multiple outputs.
Users cannot pass around additional return values to be picked up by a custom handler via state.outputs

It would be reasonable to standardize the set of return values expected from training/validation functions if the user would have the option of putting custom values in the calling engine’s state. This would require exposing state to the training/validation functions. This would solve (2) but metrics would need to be further adjusted to solve (1).

One way of standardizing the return values would be to always return only loss, model outputs, and targets. If state is ephemeral but is passed to the training/validation function, this would be of the form: return loss, model_outputs, targets, state

If state were to be exposed in an Engine as described in #117, this would be of the form: return loss, model_outputs, targets

Every element in model_outputs would be paired with each element in the targets. This would allow a Metric to optionally track a single output in a multi-output model, changing

class Metric(object):
    def __init__(self):
        self.reset()

def iteration_completed(self, engine, state):
        self.update(state.output)

class Metric(object):
    def __init__(self, output_index=None):
        self.output_index = output_index
        self.reset()

    def iteration_completed(self, engine, state):
        output = state.output
        target = state.target
        if self.output_index is not None:
            output = output[self.output_index]
            target = target[self.output_index]
        self.update(output, target)

Issue Analytics

State:
Created 6 years ago
Comments:27 (22 by maintainers)

Top GitHub Comments

3reactions

alykhantejanicommented, Mar 19, 2018

I agree with the statements above, let’s go ahead and merge Trainer and Evaluator into one and make metrics work for the Engine 😃

Thanks for starting the fruitful discussions @veugene !

2reactions

veugenecommented, Mar 19, 2018

@alykhantejani I’m computing a Dice loss, evaluated per batch as if each batch is its own volume (this is better than evaluating per element in the batch as the latter has too high a variance). I am also accumulating the counts necessary to compute a dataset-wide Dice score by the end of the epoch. This second measure is like the Metric code that was recently introduced.

@jasonkriss @alykhantejani It is very common to accumulate measures over a training epoch even as the model is changing. This overall loss is typically reported as an average over all minibatches since the start of an epoch. Other measures are reported similarly. While this is not ideal for evaluating model performance, it is still informative; model performance is evaluated on the validation set, anyway.

As @alykhantejani points out, running an Evaluator on the training set requires a second pass through the data. I would never do this as it’s too expensive on large datasets. The advantage to accumulating measures over an epoch in the training loop is that this is done online with minimal overhead. This is likely why this compromise is such a common pattern.

is there really a need to separate Trainer and Evaluator or just have a single Engine?

I guess you’re right that if these were to be made to have the same input and output formats, they could be merged. In that case, the user would still have the option of running this merged Engine as a metric “Evaluator” on the training data in eval() mode as in the pattern presented by @jasonkriss.