This model card contains trained SDXL LoRA weights using Noise-Conditioned Perceptual Preference Optimization.
Combining Noise-Conditioned Perception with DPO significantly outperforms baseline DPO method in both training speed and overall quality measured by human preferences.
We publish LoRA weights that are trained for 10 H100 GPU hours on a small subset of a Pick-a-Picv2 dataset. We removed all non-absolute winners for each prompt and our final prompts and image ids can be found here
training code: https://github.com/sakharok13/Aligning-Stable-Diffusion-with-Noise-Conditioned-Perception
To try our LoRA weights and compare with a baseline:
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda")
prompt = "An astronaut riding a green horse"
torch.manual_seed(10)
original_image = pipe(prompt=prompt).images[0]
pipe.load_lora_weights("alexgambashidze/SDXL_NCP-DPO_v0.1", weight_name="pytorch_lora_weights.safetensors")
torch.manual_seed(10)
ncp_dpo_image = pipe(prompt=prompt).images[0]
Limitations: Pick-a-Picv2 dataset is extremely biased. It contains NSFW generations and is focused on people & characters.
Core contributors:
Alexander Gambashidze
Yuri Sosnin
Anton Kulikov