RL works when you have some kind of verifier or ground truth; e.g. for math (and to some extent, coding, if you have tests and/or a type checker). You can also do it for simulations. This paper focuses on math and "embodied agent control" (i.e. simulation).