Could you point me to some more information about "visual hashing"? I'm a bit tired of "try this tool because it worked for my set of 100 pictures", but if I could read an explanation why given tool/library does what I want, that would be fantastic. The biggest issue of my use case is the sheer number of files.
Right, you're looking for things that work at the scale of 100k separate files or so. Moreover you seem pretty used to getting bad recommendations, and I know the feeling. Important caveat is that all I know about these are what I've chatgpt'd about them.
There's the aforementioned DupeGuru program which is cross platform and wields a handful of algorithms. Then there's aHash (average hash), dHash (difference hash) , and pHash (perceptual hash). They each make assumptions about which subset of image data is important, pull it out, compare it, and are meant to do it quickly and at large scales. They are all accessible from within Pythons' imagehash library and require getting your hands dirty with python. My understanding is that Dupeguru uses its own custom perceptual hashing methods.
And although it seems like you need something more specific, the very very lazy choice is md5 sum comparison which is super fast but is only testing whether files are identical copies.
dHash sounds like a good starting point if I ever get to the situation where VisiPics doesn't work anymore for some reason. It's horribly difficult to replace software that is "just good enough", and all of its problems have known mitigations.