RL works when you have some kind of verifier or ground truth; e.g. for math (and...

		reissbaker on Dec 23, 2024 \| parent \| context \| favorite \| on: Offline Reinforcement Learning for LLM Multi-Step ... RL works when you have some kind of verifier or ground truth; e.g. for math (and to some extent, coding, if you have tests and/or a type checker). You can also do it for simulations. This paper focuses on math and "embodied agent control" (i.e. simulation).