Could you point me to some more information about "visual hashing"? I'm a bit ti...

glenstein · 2025-03-31T14:40:57 1743432057

Right, you're looking for things that work at the scale of 100k separate files or so. Moreover you seem pretty used to getting bad recommendations, and I know the feeling. Important caveat is that all I know about these are what I've chatgpt'd about them.

There's the aforementioned DupeGuru program which is cross platform and wields a handful of algorithms. Then there's aHash (average hash), dHash (difference hash) , and pHash (perceptual hash). They each make assumptions about which subset of image data is important, pull it out, compare it, and are meant to do it quickly and at large scales. They are all accessible from within Pythons' imagehash library and require getting your hands dirty with python. My understanding is that Dupeguru uses its own custom perceptual hashing methods.

And although it seems like you need something more specific, the very very lazy choice is md5 sum comparison which is super fast but is only testing whether files are identical copies.

anal_reactor · 2025-03-31T21:46:47 1743457607

dHash sounds like a good starting point if I ever get to the situation where VisiPics doesn't work anymore for some reason. It's horribly difficult to replace software that is "just good enough", and all of its problems have known mitigations.