Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I don't know how you get a next token predictor that user input can't break out of.

Maybe by adjusting the transformer model to have separate input layers for the control and data paths?

 help



Maybe it's my failing but I can't imagine what that would look like.

Right now, you train an LLM by showing it lots of text, and tell it to come up with the best model for predicting the next word in any of that text, as accurately as possible across the corpus. Then you give it a chat template to make it predict what an AI assistant would say. Do some RLHF on top of that and you have Claude.

What would a model with multiple input layers look like? What is it training on, exactly?


> by showing it lots of text

When you're "showing it lots of text", where does that "show" bit happen? :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: