Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

RL works when you have some kind of verifier or ground truth; e.g. for math (and to some extent, coding, if you have tests and/or a type checker). You can also do it for simulations. This paper focuses on math and "embodied agent control" (i.e. simulation).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: