Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder if the key behind the quality of the MidJourney models, and this models, is less about size + architecture and more about the quality of images trained on.

It looks like this is the case for LLMs, that the training quality of the data has a significant impact on the output quality of the model, which makes sense.

So the real magic is in designing a system to curate that high quality data.



Midjourney unquestionably has heavy data set curation and uses RLHF from users.

You don't have to speculate on this as you can see that custom models for SDXL for instance perform vastly better than vanilla SDXL at the same number of parameters. It's all data set and tagging.


custom models perform vastly better at the tasks they are finetuned to do


That is technically true, but when the base model is wasting parameter information on poorly tagged, watermarked stock art and other garbage images, it's not really a meaningful distinction. Better data makes for better models, nobody cares about how well a model outputs trash.


Ok, but you're severely misrepresenting the importance of things. Base SDXL is a fine model. Base SDXL is going to be much better than a materially smaller model that you've retrained with "good data".


SDXL used RLHF too


It's the quality of the image text pair not the image alone but midjourney is not a model it's a suite of models that work in conjunction. They have an llm in the front to optimize the user prompts, they use SAM models, controlnet models for poses that are in high demand and so much more. That's why you can't really compare foundation models anymore because there are none.


No, it’s definitely the size. Tiny LLMs are shit. Stable Diffusion 3’s problem is not that that its training set was wildly different, it’s that it’s just too small (because the one released so far is not the full size).

You can get better results with better data, for sure. And better architecture, for sure. But raw size is really important the difference in quality for models, all else held equal, is HUGE and obvious if you play with them.


I would agree - midjourney is getting a free labour since many of their generations are not in secret mode (require pro/mega subscription) so prompts and outputs are visible to everyone. Midjourney rewards users to rating those generations. I wouldn't be surprised if there are some bots on their discord that are scraping those data for training their own models.


Are the prompts of pro users secret to Midjourney?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: