Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If anyone wants an example of actual jailbreak in the wild that uses this technique (NSFW):

https://www.reddit.com/r/persona_AI/comments/1nu3ej7/the_spi...

This doesn't work with gpt5 or 4o or really any of the models that do preclassification and routing, because they filter both the input and the output, but it does work with the 4.1 model that doesn't seem to do any post-generation filtering or any reasoning.



That description is obviously written by an AI. Has anyone actually checked whether it's an accurate description rather than just yet another LLM Making Stuff Up?

(Also, I don't think there's anything very NSFW on the far end of that link, although it describes something used for making NSFW writing.)


It looks like a healthy mix of cargo cult and mental illness




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: