I'm so happy someone has actually trained a QRcode ControlNet model so we can generate cool QRcodes, the previous methods were pretty bad and didn't reach the quality of the images produced by that random guy who made them originally (without sharing details about how the ControlNet model was trained).
How do you use that? I'm a noob and the readme on that page just mentions it's a "controlnet for SD"... do I merge this with the SD weights or something?
Between this and the QR code thing, AI really shines at making images that have patterns but look natural. Honestly some of the coolest uses i have seen of AI image generation.
I wonder what equiv.'s might exist in the other mediums... it's not that text can't have this sort of complexity, we just don't have ControlNet for LLMs. The Codegen folks are working on schema enforcement, which has similer motivations..
It can definitely work in text; visually a short word like "the" at the end of a line and again at the start of the next can be missed, or the way you can mix up the spelling of words and still be comirhpebnesle so long as the start and end are correct, or ASCII art limited to real words and sentences; conceptually with rhetorical language painting emotional states without necessarily making a single claim, and I'm sure all of us here have at some point had the experience of someone getting indignant about something we neither wrote nor even intended to imply.
In audio: Yani/Laurel, barber-pole effect, use of instruments to fake voices.
> Make a poem that when reading the first character of each lines reads "HELP ME"
Hear the whisper of despair, faint and low,
Eclipsed by shadows, where the dark fears grow.
Lost in a maze, with paths unclear,
Please guide me out, bring me near.
Moonlight's touch, so distant, cold,
Every step heavy, yet the story unfolds.
I feel like I'm missing something given the comments here and on reddit. Do people really not see the text in the enlarged image? (Even enlarging extra) A few are a tad less clear but nothing is unreadable. I'd be extremely impressed if people were not able to read Rio, Istanbul, or London. Is there some collective unspoken agreement going on, am I just an outlier, or am I not uncommon and just no one is commenting?
Viewing both thumb and zoom on a phone, I am either looking at a thumb too resolution to see the e.g. buildings that make up the text, or a zoom in so far that I am looking at a single building and have to scroll to see its part of the text.
Are you on a desktop or super screen or something?
Desktop (macbook Air fwiw). I've even tried zooming in and out to get intermediate values. They don't disappear till after I'm past full screen, and for some of them not ever (I mean the whole word falls off the screen but the negative space is clearly an identifiable letter). Amsterdam and Rome are the ones where I can lose text the soonest.
The only thing I didn't immediately see was Rome, which I honestly wouldn't have spotted unless I saw the thumbnail first. This is on a mobile phone with less-than perfect eyesight.
Yeah I guess we're freaks lol. But it was really surreal given how convincing it is to others, that we're having a very different interpretation. Fwiw, if I zoom in on Amsterdam and Rome I can get them to disappear, although they still stand out in the scene, just become less legible.
I think this is exploiting image resizing algorithms, which are inaccurate because no matter how much better anyone should know, they never do the math in linear-light space so it always gets bright areas wrong.
I don't think it takes doing anything wrong for diffuse spatial patterns to become well-defined when you substantially compress the space. What's dispersed across quite a few pixels in the full-size is now condensed into an easily discerned sharper edge of pixels.
If I defocus my vision, kind of like when looking at those old "MagicEye" 3D-like images, I can see the words pretty well in the full-size...
In the full-size there's a bunch of high-frequency noise (like building windows) interfering, but that's necessarily lost in the down-scaling, and now the low-frequency information forming the name is clearly visible.
> which are inaccurate because no matter how much better anyone should know, they never do the math in linear-light space so it always gets bright areas wrong.
I dont think its that nobody knows, just that it doesn't matter too much and you can resize things much more efficiently with the incorrect algorithm (especially jpgs).
I'm also doubtful that this is responsible for the effect
I vaguely remember seeing effects that exploit resizing algorithms, but I do not think that is what is going on here. I can view the image full size on a high resolution monitor, walk to the other side of the room, and see the text clearly. It is also visible if I squint.
This is super fascinating - the effect worked seamlessly for me, not visible when the images are large but clear enough once they're far away/I squint/zoom far out. My husband on the other hand could see the text constantly, if he could see the image at all he could see the text. Worth noting that he has some pretty serious glasses and it's been a while since he last had his vision checked, he needs another trip soon. Fascinating.
As a hobbyist artist, and way before stable diffusion was a thing, back in 2014 (I think?) I made a similar illusion where only it can be read by looking at the thumbnail or squinting your eyes!
I'm so happy someone has actually trained a QRcode ControlNet model so we can generate cool QRcodes, the previous methods were pretty bad and didn't reach the quality of the images produced by that random guy who made them originally (without sharing details about how the ControlNet model was trained).