Elegant architecture, trained from scratch, excels at image editing. This looks ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ed on Oct 30, 2024 \| parent \| context \| favorite \| on: Show HN: AI OmniGen – AI Image Generator with Cons... Elegant architecture, trained from scratch, excels at image editing. This looks very interesting! From https://arxiv.org/html/2409.11340v1 > Unlike popular diffusion models, OmniGen features a very concise structure, comprising only two main components: a VAE and a transformer model, without any additional encoders. > OmniGen supports arbitrarily interleaved text and image inputs as conditions to guide image generation, rather than text-only or image-only conditions. > Additionally, we incorporate several classic computer vision tasks such as human pose estimation, edge detection, and image deblurring, thereby extending the model’s capability boundaries and enhancing its proficiency in complex image generation tasks. This enables prompts for edits like: "\|image_1\| Put a smile face on the note." or "The canny edge of the generated picture should look like: \|image_1\|" > To train a robust unified model, we construct the first large-scale unified image generation dataset X2I, which unifies various tasks into one format.

nairoz on Oct 30, 2024 [–]

> trained from scratch

Not exactly. They mention starting from the VAE from Stable Diffusion XL and the Transformer from Phi3.

Looks like these LLMs can really be used for anything

yieldcrv on Oct 31, 2024 | [–]

Pretty cool, comfy ui and community is too cumbersome for me and still results in too much throwaway content

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact