I just made this for my own usage, so I'll share it to amortize my Claude Code carbon footprint... lol. I think the concept of single binary CLIs like ffmpeg are gaining importance in this "agent" age, so figured it might be useful to have that for images.
I understand what you mean, and that's a fair point. However, as John Schulman pointed out in his talk, it is possible to clone the behavior of the model, but it won't work to avoid hallucination since the underlying pretrained models are different. If we clone ChatGPT's behavior (by using its output), we'll get the worst of both worlds: weird output coming from its RLHF step AND hallucination.
That seems like an experimentally-testable prediction - that attempts to "clone ChatGPT's behavior (by using its output)" will necessarily "get the worst of both worlds: weird output coming from its RLHF step AND hallucination".
The reasoning that the results won't be quite as good as RLHF, or result in a perfect 'clone' of ChatGPT's capabilities, seems pretty good to me.
But the idea it won't be helpful at all, especially to projects that are just seeking some incremental advantage? Seems speculative.
In particular, when you read the linked comments from Yoav Go, he outlines a potential RL process that uses automated scoring for non-exact similarity to preferred answers. Using known (or even 'probably') good answers from ChatGPT output, as the inputs to that process, seems like it could often offer some of the same sort of improvement to other models as ChatGPT obtained via its RLHF.
Yeah, just to be clear, I think using ChatGPT for creating small datasets for niche models makes total sense. I'm talking about creating foundation models which is a different thing.
Yes indeed, Yoav Goldberg's post is essentially a good summary of John Schulman's talk, which is excellent. I highly recommend people watch it.
However, my point goes beyond this technical argument. I would argue that even if, by some magical process, we could perfectly replicate GPT-4 behaviour, I still don't think it's a good idea or at least it's not enough. Don't get me wrong, it would be really handy to have a free version running on our own cluster, but it wouldn't fix the other issues I mentioned.
Of course ... it's a fractal (dimension = 2.05). Fractal present some degree of self-similarity, not a perfect one. In fact, this is the whole beauty of it.