question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Getting completely different hashes of almost identical images

See original GitHub issue

Hi! I’m trying to compare image unmodified, as taken by camera, and photoshopped image (tweaked histogram and a bit changed white balance) and get distances above 25 if using code as in examples: hash = imagehash.phash(Image.open(path))

But if i modify code like this:

img = cv2.imread(path)
img = Image.fromarray(img)
hash = imagehash.phash(img)

I get distance of 0 Looks like it might be caused by different color spaces, or something like that. Hope this info could help somebody.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:16 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
JohannesBuchnercommented, Jun 22, 2018

testl testl2

If you flip between them, one has several bright spots. That might have to do with the transparency. You could try putting a white background, transform to RGB (without alpha channel), and see if it makes a difference.

With hash1 - hash2 you can compute the hamming distance. For your application, you might want to live with a threshold > 0.

1reaction
Hyperclaw79commented, Jun 22, 2018

@JohannesBuchner As you suggested, I’ve tried printing out hash.hash to see the differences but I am not able to proceed using that info.

https://cdn.discordapp.com/attachments/408321293960871956/459713247415894016/PokecordSpawn.jpg:
[[False  True False  True  True False  True False]
 [ True  True  True False False  True False False]
 [ True False  True False False  True False False]
 [ True False False False  True  True False False]
 [ True  True  True False  True  True False False]
 [False  True  True False False False  True False]
 [ True  True  True  True False False False False]
 [ True False  True  True False False False False]]

https://www.pokecord.com/assets/sSchDWVBILou.png:
[[ True  True False  True  True False  True  True]
 [ True  True  True False False  True  True False]
 [ True False  True False False  True False False]
 [ True False False False  True  True False False]
 [ True  True  True False  True  True False False]
 [False  True  True False False False  True False]
 [ True False  True  True False False False False]
 [False False  True  True False False  True  True]]

Even though the images look visually similar to the naked eye, they are getting hashed differently. Any way for me to use the hash.hash information to fix the detection?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Detection of Duplicate Images Using Image Hash Functions
The most straightforward approach to detect duplicates would be on file size or filename. However, photos are usually derived from different sources such...
Read more >
Two identical images have a different hash can't figure out why
So 1.jpg and 2.jpg are identical. Then for each image I calculate a "difference" hash of length 256 using the function get_hash.
Read more >
Duplicate image detection with perceptual hashing in Python
To determine whether an image is a duplicate, you compare their dHash values. If the hash values are equal, the images are nearly...
Read more >
Detecting similar and identical images using perseptual hashes
To compare two images, calculate the Hamming distance between two average hashes. A distance of zero indicates that it is likely a very...
Read more >
Deduplication: Why Computers See Differences in Files that ...
For example, no two optical scans of a document will produce identical hash values because there will always be some variation in the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found