metadata

license: mit

Momo XL - Anime-Style SDXL Base Model

Momo XL is an anime-style model based on SDXL, fine-tuned to produce high-quality anime-style images with detailed and vibrant aesthetics. (Oct 6, 2024)

Key Features:

Anime-Focused SDXL: Tailored for generating high-quality anime-style images, making it ideal for artists and enthusiasts.
Optimized for Tag-Based Prompting: Works best when prompted with descriptive tags, ensuring accurate and relevant outputs.
LoRA Compatible: Compatible with most LoRA models available on the hub, allowing for versatile customization and style transfer.

Usage Instructions:

Tagging: Use descriptive tags separated by commas to guide the image generation. Tags can be arranged in any order to suit your creative needs.
Year-Specific Styles: To emulate art styles from a specific year, use the tag format "year 20XX" (e.g., "year 2023").
LoRA Models: Momo XL supports most LoRA models, enabling enhanced and tailored outputs for your projects.

Disclaimer:

This model may produce unexpected or unintended results. Use with caution and at your own risk.

Important Notice:

Ethical Use: Please ensure that your use of this model is ethical and complies with all applicable laws and regulations.
Content Responsibility: Users are responsible for the content they generate. Do not use the model to create or disseminate illegal, harmful, or offensive material.
Data Sources: The model was trained on publicly available datasets. While efforts have been made to filter and curate the training data, some undesirable content may remain.

Thank you! 😊

Momo XL - Training Details (Oct 15, 2024)

Dataset

Momo XL was trained using a dataset of over 400,000+ images sourced from Danbooru.

Base Model

Momo XL was built on top of SDXL, incorporating knowledge from two finetuned models:

Formula:
SDXL_base + (Animagine 3.0 base - SDXL_base) * 1.0 + (Pony V6 - SDXL_base) * 0.5

For more details:

Training Process

Training was conducted on A100 80GB GPUs, totaling over 2000+ GPU hours. The training was divided into three stages:

Finetuning - First Stage: Trained on the entire dataset with a defined set of training configurations.
Finetuning - Second Stage: Also trained on the entire dataset with some variations in settings.
Adjustment Stage: Focused on aesthetic adjustments to improve the overall visual quality.

The final model, Momo XL, was released by merging the Text Encoder from the Finetuning Second Stage with the UNet from the Adjustment Stage.

Hyperparameters

Stage	Epochs	UNet lr	Text Encoder lr	Batch Size	Resolution	Noise Offset	Optimizer	LR Scheduler
Finetuning 1st Stage	10	2e-5	1e-5	256	1024²	N/A	AdamW8bit	Constant
Finetuning 2nd Stage	10	2e-5	1e-5	256	Max. 1280²	N/A	AdamW	Constant
Adjustment Stage	0.25	8e-5	4e-5	1024	Max. 1280²	0.05	AdamW	Constant