How does such a distillation work in theory? They don’t have weights from OpenAI... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		blackeyeblitzar on Jan 28, 2025 \| parent \| context \| favorite \| on: The Illustrated DeepSeek-R1 How does such a distillation work in theory? They don’t have weights from OpenAI’s models, and can only call their APIs, right? So how can they actually build off of it?

moralestapia on Jan 28, 2025 [–]

Like RLHF but the HF part is GPT4 instead.

KarraAI on Jan 28, 2025 | [–]

How do you ensure the student model learns robust generalizations rather than just surface-level mimicry?

moralestapia on Jan 28, 2025 | | [–]

No idea as I don't work on that, but my guess would be that the higher the 'n' the more model A approaches model B.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact