Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

People are already going full Leroy Jenkins with this stuff, and OpenAI, other labs are snarfing up their usage data. Hopefully with their brave sacrifice, they can figure out all the security pitfalls before it becomes common enough that someone with a clever jailbreak ends up pulling of a billion dollar heist, or orders pizza for half the country.

It's 100% absolutely not safe yet. You can effectively copy and paste Pliny prompts and pwn any of the frontier lab models. Anyone with a little time and creativity can tailor a unique one and set hidden text traps for AI browsers or agents, and depending on what access you've given the software it could be very dangerous.





Great time to be an offensive security researcher specialising in researching LLM adversarial attacks.

Yeah - the red team folks probably have one of the most fun jobs in the world right now.

Depends on your definition of "fun"



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: