OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Abstract
Instruction-guided image editing methods have demonstrated significant potential by training diffusion models on automatically synthesized or manually annotated image editing pairs. However, these methods remain far from practical, real-life applications. We identify three primary challenges contributing to this gap. Firstly, existing models have limited editing skills due to the biased synthesis process. Secondly, these methods are trained with datasets with a high volume of noise and artifacts. This is due to the application of simple filtering methods like CLIP-score. Thirdly, all these datasets are restricted to a single low resolution and fixed aspect ratio, limiting the versatility to handle real-world use cases. In this paper, we present \omniedit, which is an omnipotent editor to handle seven different image editing tasks with any aspect ratio seamlessly. Our contribution is in four folds: (1) \omniedit is trained by utilizing the supervision from seven different specialist models to ensure task coverage. (2) we utilize importance sampling based on the scores provided by large multimodal models (like GPT-4o) instead of CLIP-score to improve the data quality. (3) we propose a new editing architecture called EditNet to greatly boost the editing success rate, (4) we provide images with different aspect ratios to ensure that our model can handle any image in the wild. We have curated a test set containing images of different aspect ratios, accompanied by diverse instructions to cover different tasks. Both automatic evaluation and human evaluations demonstrate that \omniedit can significantly outperform all the existing models. Our code, dataset and model will be available at https://tiger-ai-lab.github.io/OmniEdit/
Community
The best generalist image editing model. It can tackle all the image editing tasks and outperform the existing baselines by at least 20%. All the data and models will be available.
Tell me about this
There's a summary of this paper here: https://www.aimodels.fyi/papers/arxiv/omniedit-building-image-editing-generalist-models-through
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction (2024)
- ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models (2024)
- SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing (2024)
- PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions (2024)
- DiT4Edit: Diffusion Transformer for Image Editing (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper