Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improper normalization of the scores?

See original GitHub issue

In train.py, you normalize the scores according to:

test_map = [list() for p in pool_layers]
for l, p in enumerate(pool_layers):
    test_norm = torch.tensor(test_dist[l], dtype=torch.double)  # EHWx1
    test_norm-= torch.max(test_norm) # normalize likelihoods to (-Inf:0] by subtracting a constant
    test_prob = torch.exp(test_norm) # convert to probs in range [0:1]
    test_mask = test_prob.reshape(-1, height[l], width[l])
    test_mask = test_prob.reshape(-1, height[l], width[l])
    # upsample
    test_map[l] = F.interpolate(test_mask.unsqueeze(1),
        size=c.crp_size, mode='bilinear', align_corners=True).squeeze().numpy()
# score aggregation
score_map = np.zeros_like(test_map[0])
for l, p in enumerate(pool_layers):
    score_map += test_map[l]

This normalization is fine as long as it is done for only one map since this normalization function is monotonically increasing. By adding up the maps from the different layers, this makes no sense to me since the relative weighting of the score maps for aggregation (last line) depends on the test set or to be more precise on the maxima of the individual maps over the test set. Am I missing something here or is this normalization improper?

Issue Analytics

State:
Created a year ago
Comments:9 (4 by maintainers)

Top GitHub Comments

1reaction

marco-rudolphcommented, Mar 29, 2022

@marco-rudolph hmm, I think you are right that max is improper for multi-scale case if we cannot use any statistics. In practice, we probably know past statistics and can assume max. This might hold in some practical cases, but cannot be assumed in the anomaly detection setting. Furthermore, max is very sensitive to the test set as scores can explode since the exponentiation can produce very large values. Using only train data would change the weighting a lot. In general, the usage of max is very sensitive to outliers. In practice, I observed that a simple addition without weighting would worsen the mean AUPRO score by about 3% which is quite significant compared to other work.

0reactions

gudovskiycommented, Apr 19, 2022

@alevangel well, you can replace test with train