Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A quick Google search reveals terms such as "sparse attention" that are used to avoid quadratic runtime.

I don't know if Anthropic has revealed such details since AI research is getting more and more secretive, but the architectural tricks definitely exist.



Then you need to do a little bit deeper research. No one just applies sparse attention at inference time for a model not trained for it. They do this at training time because otherwise the task performance degrades too much.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: