I see you already mention diffusion - iirc there was a result not too long ago t...

sdpmas · 2026-03-05T00:37:09 1772671029

diffusion is promising, but still an open question how much data efficient they are compared to AR. in practice, you can also train AR forever with high enough regularization, so let's see.

_0ffh · 2026-03-05T00:39:47 1772671187

Yes, it could go either way of course.

Still, just for reference, here's the paper I remembered: https://arxiv.org/pdf/2507.15857

sdpmas · 2026-03-05T00:55:33 1772672133

thanks, here's another one: https://arxiv.org/abs/2511.03276