metadata
license: mit
Momo XL - Anime-Style SDXL Base Model
Momo XL is an anime-style model based on SDXL, fine-tuned to produce high-quality anime-style images with detailed and vibrant aesthetics. (Oct 6, 2024)
Key Features:
- Anime-Focused SDXL: Tailored for generating high-quality anime-style images, making it ideal for artists and enthusiasts.
- Optimized for Tag-Based Prompting: Works best when prompted with descriptive tags, ensuring accurate and relevant outputs.
- LoRA Compatible: Compatible with most LoRA models available on the hub, allowing for versatile customization and style transfer.
Usage Instructions:
- Tagging: Use descriptive tags separated by commas to guide the image generation. Tags can be arranged in any order to suit your creative needs.
- Year-Specific Styles: To emulate art styles from a specific year, use the tag format "
year 20XX
" (e.g., "year 2023
"). - LoRA Models: Momo XL supports most LoRA models, enabling enhanced and tailored outputs for your projects.
Disclaimer:
This model may produce unexpected or unintended results. Use with caution and at your own risk.
Important Notice:
- Ethical Use: Please ensure that your use of this model is ethical and complies with all applicable laws and regulations.
- Content Responsibility: Users are responsible for the content they generate. Do not use the model to create or disseminate illegal, harmful, or offensive material.
- Data Sources: The model was trained on publicly available datasets. While efforts have been made to filter and curate the training data, some undesirable content may remain.
Thank you! 😊
Momo XL - Training Details (Oct 15, 2024)
Dataset
Momo XL was trained using a dataset of over 400,000+ images sourced from Danbooru.
Base Model
Momo XL was built on top of SDXL, incorporating knowledge from two finetuned models:
- Formula:
SDXL_base + (Animagine 3.0 base - SDXL_base) * 1.0 + (Pony V6 - SDXL_base) * 0.5
For more details:
Training Process
Training was conducted on A100 80GB GPUs, totaling over 2000+ GPU hours. The training was divided into three stages:
- Finetuning - First Stage: Trained on the entire dataset with a defined set of training configurations.
- Finetuning - Second Stage: Also trained on the entire dataset with some variations in settings.
- Adjustment Stage: Focused on aesthetic adjustments to improve the overall visual quality.
The final model, Momo XL, was released by merging the Text Encoder from the Finetuning Second Stage with the UNet from the Adjustment Stage.
Hyperparameters
Stage | Epochs | UNet lr | Text Encoder lr | Batch Size | Resolution | Noise Offset | Optimizer | LR Scheduler |
---|---|---|---|---|---|---|---|---|
Finetuning 1st Stage | 10 | 2e-5 | 1e-5 | 256 | 1024² | N/A | AdamW8bit | Constant |
Finetuning 2nd Stage | 10 | 2e-5 | 1e-5 | 256 | Max. 1280² | N/A | AdamW | Constant |
Adjustment Stage | 0.25 | 8e-5 | 4e-5 | 1024 | Max. 1280² | 0.05 | AdamW | Constant |