Metrics with multiple outputs & standardizing outputs
See original GitHub issueThe current implementation of metrics assumes a very specific set of return values from the training/validation function: (y_pred, y), as can be seen for example in https://github.com/pytorch/ignite/blob/master/ignite/metrics/mean_absolute_error.py#L20 .
This has a couple inconvenient consequences:
- Existing metrics cannot handle a model with multiple outputs.
- Users cannot pass around additional return values to be picked up by a custom handler via
state.outputs
It would be reasonable to standardize the set of return values expected from training/validation functions if the user would have the option of putting custom values in the calling engine’s state
. This would require exposing state to the training/validation functions. This would solve (2) but metrics would need to be further adjusted to solve (1).
One way of standardizing the return values would be to always return only loss, model outputs, and targets. If state is ephemeral but is passed to the training/validation function, this would be of the form:
return loss, model_outputs, targets, state
If state were to be exposed in an Engine as described in #117, this would be of the form:
return loss, model_outputs, targets
Every element in model_outputs
would be paired with each element in the targets. This would allow a Metric to optionally track a single output in a multi-output model, changing
class Metric(object):
def __init__(self):
self.reset()
def iteration_completed(self, engine, state):
self.update(state.output)
to
class Metric(object):
def __init__(self, output_index=None):
self.output_index = output_index
self.reset()
def iteration_completed(self, engine, state):
output = state.output
target = state.target
if self.output_index is not None:
output = output[self.output_index]
target = target[self.output_index]
self.update(output, target)
Issue Analytics
- State:
- Created 6 years ago
- Comments:27 (22 by maintainers)
Top GitHub Comments
I agree with the statements above, let’s go ahead and merge
Trainer
andEvaluator
into one and make metrics work for theEngine
😃Thanks for starting the fruitful discussions @veugene !
@alykhantejani I’m computing a Dice loss, evaluated per batch as if each batch is its own volume (this is better than evaluating per element in the batch as the latter has too high a variance). I am also accumulating the counts necessary to compute a dataset-wide Dice score by the end of the epoch. This second measure is like the Metric code that was recently introduced.
@jasonkriss @alykhantejani It is very common to accumulate measures over a training epoch even as the model is changing. This overall loss is typically reported as an average over all minibatches since the start of an epoch. Other measures are reported similarly. While this is not ideal for evaluating model performance, it is still informative; model performance is evaluated on the validation set, anyway.
As @alykhantejani points out, running an Evaluator on the training set requires a second pass through the data. I would never do this as it’s too expensive on large datasets. The advantage to accumulating measures over an epoch in the training loop is that this is done online with minimal overhead. This is likely why this compromise is such a common pattern.
I guess you’re right that if these were to be made to have the same input and output formats, they could be merged. In that case, the user would still have the option of running this merged Engine as a metric “Evaluator” on the training data in eval() mode as in the pattern presented by @jasonkriss.