Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Could you help understand the importance of RL finetuning? What can it accomplish that regular finetuning can't? What's a use case for it?


From my experience there are three key issues with agents today:

1. They usually don't end up completing the right set of steps required to complete tasks when using our human-defined frameworks (react, rewoo, supervisor-worker, teams of multi-agents, etc.)

2. They get lost easily, and forget what they were doing or complete the same tasks over and over in a loop (bad planning)

3. They exit early, thinking they have completed the task when they have not (bad evaluation)

The jump in reasoning ability from 4o to o3 will enable a drastic improvement in planning and execution within our human defined frameworks.

But, more importantly, I believe RL fine tuning will enable the model to learn better general approaches to planning and executing steps to complete work. This is Sutton's bitter lesson at work.

For me, desktop automation is the killer app of RL fine tuning, rather than better reasoning in chatbot apps and APIs.

When OpenAI releases their desktop agent capabilities built on this, hopefully in Jan, I think we're going to see another ChatGPT moment.

Even if not, the ability to easily train the system to complete your tasks successfully with full desktop usage is going to be a major unlock for enterprises.

More on RL fine tuning here: https://openai.com/form/rft-research-program/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: