question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Image interpolation method for downscaling (nearest neighbour, antialias)

See original GitHub issue

I was testing different resizing algorithms and I noticed that the Nearest Neighbour algorithm is way faster than Antialiasing.

I was just wondering why ImageHash used Image.ANTIALIAS over Image.NEAREST for something that will be processed by the program (we don’t really care about how the image look if we have the features)

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

3reactions
RandomNameUsercommented, Jan 31, 2022

Having messed around with image resizing in Python, I have a few comments:

Using NEAREST to generate the hash sounds like a bad idea. NEAREST is very sensitive to minor image shifts, as it will just pick a pixel value more or less at random, depending on which pixel is used. So the hashes for similar but slightly shifted images will be very different, which is (IMHO) not the goal of the hashing. So a better algorithm is needed.

A common approach to speed up image scaling in Python is to do it in two steps: scale down the image at 2x, 4x or 8x the target size using NEAREST and then do a final step using ANTIALIAS. The result will be not exactly the same as a full antialias, but much faster and significantly closer than the NEAREST.

In addition to that PIL has a special method to downscale images quickly: thumbnail. Thumbnail is most effective if it is used before the image data is loaded, as it can tell the file loader to only load the needed data for the smaller image. This works very well for JPEG, less well for other formats.

I did run some tests (code at the end). I used the same test images as @Animenosekai, but I did not preload the data. I did warm the file system cache by loading all images once and discarding them.

Here is an example image and its 8x8 versions for comparison:

1_orig
Original
1_nearest
Nearest
1_antialias
Anti-Alias
1_nearest+aa
Nearest+AA
1_thumbnail
Thumbnail

In this view the differences between AA, N+AA and TH seem to be almost invisible, but the hashes do find differences. I tested it with dhash, and while nearest has an average distance of 16 (clearly unacceptable) to AA, both N+AA and TH have 3.65, which is much better but still noticeable. This probably really only matters in cases where there are old, existing hashes to compare to, for new projects and databases I wouldn’t expect to see a difference in detection rate for either of these algorithms.

Interestingly I got fairly different performance numbers than @Animenosekai. In my tests without preloading data and with calculating the hash (from the scaled-down image) the differences between the algorithms were very small:

Mode Time
Image.ANTIALIAS 8.469 sec.
Image.NEAREST 8.167 sec.
Image.NEAREST+ANTIALIAS 7.996 sec.
Image.THUMBNAIL4 8.418 sec.

So, not very exciting. 😒

However, the test faces are only 64x64 pixels, which is very small. I tested it with some larger (~3000x4000) JPEG test images, and got more interesting results:

Mode Time Speedup Avg. Distance
Image.ANTIALIAS 23.155 sec.
Image.NEAREST 16.357 sec. 1.42 16.54
Image.NEAREST+ANTIALIAS 14.079 sec. 1.64 3.56
Image.THUMBNAIL4 6.757 sec. 3.43 4.78

So THUMBNAIL is 3x faster than AA, and about 2x faster than NEAREST.

So at this point I’m not sure what I would recommend. My use case is more like the second test: large images on disc. In that case THUMBNAIL makes a big difference, so I would love seeing it in imagehash.

Just my $.02…

Test Code:

from os import listdir
from time import time
from PIL import Image
import imagehash
DATA_DIR = "testimages_64/"
DATA_DIR = "testimages_4k/"
imagefiles = []

for image in listdir(DATA_DIR):
    try:
        imagefiles.append(f"{DATA_DIR}{image}")
    except: pass

print("Warming filesystem cache...")
for i in imagefiles:
    image = Image.open(i)
    image.load()
print("Done. Starting measurements...")

nhashes = []
start = time()
for i in imagefiles:
    image = Image.open(i)
    image = image.resize((8, 8), Image.NEAREST)
    nhashes.append(imagehash.dhash(image))
ntime = time() - start
print(f"Time taken (Image.NEAREST): {ntime:.3f} sec.")

ahashes = []
start = time()
for i in imagefiles:
    image = Image.open(i)
    image = image.resize((8, 8), Image.ANTIALIAS)
    ahashes.append(imagehash.dhash(image))
atime = time() - start
print(f"Time taken (Image.ANTIALIAS): {atime:.3f} sec.")

nahashes = []
start = time()
for i in imagefiles:
    image = Image.open(i)
    image = image.resize((8*4, 8*4), Image.NEAREST).resize((8, 8), Image.ANTIALIAS)
    nahashes.append(imagehash.dhash(image))
natime = time() - start
print(f"Time taken (Image.NEAREST+ANTIALIAS): {natime:.3f} sec.")

def preresize(img, box):
    factor = 1
    while img.size[0] > box[0] * factor and img.size[1] > box[1] * factor:
        factor *= 2
    if factor > 1:
        img.thumbnail((img.size[0] / factor, img.size[1] / factor), Image.NEAREST)
    return img

thhashes = []
start = time()
for i in imagefiles:
    image = Image.open(i)
    image = preresize(image, (8*4, 8*4))
    image = image.resize((8,8), Image.ANTIALIAS)
    thhashes.append(imagehash.dhash(image))
ttime = time() - start
print(f"Time taken (Image.THUMBNAIL4): {ttime:.3f} sec.")

# Calc average distances
ndist = nadist = thdist = 0
for aa, n, na, th in zip(ahashes, nhashes, nahashes, thhashes):
    ndist += aa - n
    nadist += aa - na
    thdist += aa - th
ndist /= len(ahashes)
nadist /= len(nahashes)
thdist /= len(thhashes)

print(f"Speedup: N {atime/ntime:.2f}, dist {ndist:.2f}")
print(f"Speedup: NA {atime/natime:.2f}, dist {nadist:.2f}")
print(f"Speedup: Th4 {atime/ttime:.2f}, dist {thdist:.2f}")


def save_img(name, fname):
    image = Image.open(fname)
    image.save(name + "_orig.png")
    image.resize((8, 8), Image.NEAREST).resize((64,64), Image.NEAREST).save(name + "_nearest.png")
    image.resize((8, 8), Image.ANTIALIAS).resize((64,64), Image.NEAREST).save(name + "_antialias.png")
    image.resize((8*4, 8*4), Image.NEAREST).resize((8, 8), Image.ANTIALIAS).resize((64,64), Image.NEAREST).save(name + "_nearest+aa.png")
    image = Image.open(fname)
    preresize(image, (32, 32)).resize((8, 8), Image.ANTIALIAS).resize((64,64), Image.NEAREST).save(name + "_thumbnail.png")

save_img("1", imagefiles[0])

0reactions
JohannesBuchnercommented, Feb 1, 2022

A subtlety is that some hashes do hashsize x hashsize, some do (hashsize + 1) x hashsize, but we can spell that out.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Image scaling - Wikipedia
One of the simpler ways of increasing image size is nearest-neighbor interpolation, replacing every pixel with the nearest pixel in the output; for...
Read more >
Understanding Digital Image Interpolation
Bilinear interpolation considers the closest 2x2 neighborhood of known pixel values surrounding the unknown pixel. It then takes a weighted average of these...
Read more >
A Fast Method for Scaling Color Images - EURASIP
The nearest neighbor method produces severe aliasing artifacts. The common downscaling methods include antialias filter and re-sampling. The downscaled data are.
Read more >
The dangers behind image resizing - Zuru Tech
The naive approach is to round the coordinates to the nearest integers (nearest-neighbor interpolation). However, better results can be ...
Read more >
Interpolation algorithms when downscaling - Stack Overflow
Edit: Lets assume we have a one dimensional image, with one colour channel per point. A downscale algorithm scaling 6 to 3 points...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found