Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AP computation for Revisited Oxford/Paris datasets

See original GitHub issue

Hi, congratulations on the great work and thanks for providing this reference implementation!

I have a question regarding AP computation. The convention for the Oxford/Paris datasets (and their revisited extensions) is to use an interpolation method, by averaging two adjacent precision points then multiplying by the recall step (see implementation by @filipradenovic here, and my reimplementation here). This is different from the “finite sum” method (Wikipedia reference), which I believe is the one used by sklearn.metrics.average_precision_score – which is used by your code. Please correct me if I am wrong here 😃

So, if what I state above about your code is correct, I am wondering what the mAP figures would be if you use the different mAP convention.

To illustrate the differences in implementation, here is a toy example, where the AP computed by the sklearn implementation is much higher than the one computed by the convention of Oxf/Par datasets (0.5 versus 0.3333):

# similarities and labels.
s = [3, 4, 1, 2]
y = [1, 0, 1, 0]

# computed from the library used in your code, produces AP=0.5.
from sklearn.metrics import average_precision_score
sklearn_ap = average_precision_score(y, s)
print("sklearn AP: %f" % sklearn_ap)

# computed from Revisited dataset convention, using my code, produces AP=0.333333.
# Note: see installation instructions at: https://github.com/tensorflow/models/blob/master/research/delf/INSTALL_INSTRUCTIONS.md
from delf.python.detect_to_retrieve import dataset
import numpy as np
ranks = []
for rank, i in enumerate(np.argsort(-np.array(s))):
  if y[i]:
    ranks.append(rank)
revisited_ap = dataset.ComputeAveragePrecision(ranks)
print("revisited AP: %f" % revisited_ap)

Overall, my guess would be that the results would not differ by that much since these datasets are large, but it would be good if can be sure of that. Also, as I said above, I may have missed something, so please feel free to correct me 😃

Issue Analytics

State:
Created 4 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

almazancommented, Aug 28, 2019

Hi @andrefaraujo,

You are right, sklearn.metrics.average_precision_score uses the “finite sum” method, which is different from the “interpolation” method used by Radenovic et al. in the Revisited versions of Oxford5K and Paris6K. This is something that we overlooked, so thank you so much for pointing it out.

I updated the code using your AP computation function and re-run all the evaluations, and in line with your guess, the difference between both implementations is less significant if the dataset and the number of relevant images is large. In the worst case, the AP figure decreases by less than 0.5% mAP. For example, these are the different mAPs for RParis6K and ROxford5K:

Using the “finite sum” method:

RParis6K (med/hard) = 80.35 / 61.03
ROxford5K (med/hard) = 67.36 / 42.76

Using the “interpolation” method:

RParis6K (med/hard) = 80.31 / 60.86
ROxford5K (med/hard) = 67.13 / 42.26

In any case, I agree that our evaluation should also follow the dataset convention, so in addition to updating the code, I will also update the numbers in the README showing the new AP computation, and check if there’s any case in our experiments where this difference can be more significant and problematic.

Thanks again! 😃

Edit: link to the AP function

0reactions

filipradenoviccommented, Sep 13, 2019

Adding some resources to this discussion, maybe something will benefit from seeing it:

Simple example of “interpolation” AP computation: revisitop issue #2
Original “interpolation” implementation from the Oxford Buildings dataset that set the trend: compute_ap.cpp

Top Results From Across the Web

AP computation for Revisited Oxford/Paris datasets #8 - GitHub

I have a question regarding AP computation. The convention for the Oxford/Paris datasets (and their revisited extensions) is to use an ...

Revisiting Oxford and Paris: Large-Scale Image Retrieval ...

In this paper we address issues with image retrieval benchmarking on standard and popular Oxford 5k and. Paris 6k datasets. In particular, annotation...

Paris Dataset - Visual Geometry Group - University of Oxford

The Paris Dataset consists of 6412 images collected from Flickr by searching for particular Paris landmarks. Various Paris landmarks. Queries. The following 12 ......

mAP of sum pooling and BoW aggregation tech - ResearchGate

This work proposes a simple instance retrieval pipeline based on encoding the convolutional features of CNN using the bag of words aggregation scheme...

Image Retrieval Based on Learning to Rank and Multiple Loss

By calculating the Euclidean distance between the extracted query image and the feature vector of the images in the dataset, we randomly select...