MGIE
This repository contains the UNet and LLaVA model checkpoints from Guiding Instruction-based Image Editing via Multimodal Large Language Models.
For a detailed example of usage, refer to this notebook and the official repository. Additionally, this notebook is a memory-optimized version of the original one. This decouples the MGIE inference pipeline into two broad stages:
- Calculate all the embeddings in a batched manner with the LLaVA model and the edit head.
- Pop it off the memory to gain VRAM.
- Loads the InstructPix2Pix pipeline and performs editing.
💡 MGIE needs additional set up steps that are important to follow before running inference. Please refer to the repository for those instructions. Importantly, it needs you to merge the LLaVA weight deltas with the original LLaMA parameters. More details are in the repository.
Processing ultra high-resolution images
Since the InstructPi2xPi2x pipeline doesn't do any internal processing to resize the input images, you might get OOMs when processing ultra high-resolution images like this one.
So, it's recommended to resize them, preserving their aspect-ratio. Here's a utility function that can be leveraged here:
from diffusers.utils import load_image
def resize_image_aspect_ratio(img_url, base_width=None, base_height=None):
# Load the image
img = load_image(img_url).convert("RGB")
# Get the current width and height of the image
width, height = img.size
# Calculate the new dimensions based on the aspect ratio
if base_width is not None:
# Calculate new height based on the base_width to maintain aspect ratio
w_percent = (base_width / float(width))
h_size = int((float(height) * float(w_percent)))
new_size = (base_width, h_size)
elif base_height is not None:
# Calculate new width based on the base_height to maintain aspect ratio
h_percent = (base_height / float(height))
w_size = int((float(width) * float(h_percent)))
new_size = (w_size, base_height)
else:
raise ValueError("Either base_width or base_height must be provided")
# Resize the image
resized_img = img.resize(new_size, Image.ANTIALIAS)
return resized_img
Citation
@inproceedings{fu2024mgie,
author = {Tsu-Jui Fu and Wenze Hu and Xianzhi Du and William Yang Wang and Yinfei Yang, and Zhe Gan},
title = {{Guiding Instruction-based Image Editing via Multimodal Large Language Models}},
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2024}
}
- Downloads last month
- 63