Momo-XL / README.md
PotatoBox's picture
Update README.md
bc25d31 verified
metadata
license: mit

Momo XL - Anime-Style SDXL Base Model

Momo XL is an anime-style model based on SDXL, fine-tuned to produce high-quality anime-style images with detailed and vibrant aesthetics. (Oct 6, 2024)

Key Features:

  • Anime-Focused SDXL: Tailored for generating high-quality anime-style images, making it ideal for artists and enthusiasts.
  • Optimized for Tag-Based Prompting: Works best when prompted with descriptive tags, ensuring accurate and relevant outputs.
  • LoRA Compatible: Compatible with most LoRA models available on the hub, allowing for versatile customization and style transfer.

Usage Instructions:

  • Tagging: Use descriptive tags separated by commas to guide the image generation. Tags can be arranged in any order to suit your creative needs.
  • Year-Specific Styles: To emulate art styles from a specific year, use the tag format "year 20XX" (e.g., "year 2023").
  • LoRA Models: Momo XL supports most LoRA models, enabling enhanced and tailored outputs for your projects.

Disclaimer:

This model may produce unexpected or unintended results. Use with caution and at your own risk.

Important Notice:

  • Ethical Use: Please ensure that your use of this model is ethical and complies with all applicable laws and regulations.
  • Content Responsibility: Users are responsible for the content they generate. Do not use the model to create or disseminate illegal, harmful, or offensive material.
  • Data Sources: The model was trained on publicly available datasets. While efforts have been made to filter and curate the training data, some undesirable content may remain.

Thank you! 😊


Momo XL - Training Details (Oct 15, 2024)

Dataset

Momo XL was trained using a dataset of over 400,000+ images sourced from Danbooru.

Base Model

Momo XL was built on top of SDXL, incorporating knowledge from two finetuned models:

  • Formula:
    SDXL_base + (Animagine 3.0 base - SDXL_base) * 1.0 + (Pony V6 - SDXL_base) * 0.5

For more details:

Training Process

Training was conducted on A100 80GB GPUs, totaling over 2000+ GPU hours. The training was divided into three stages:

  • Finetuning - First Stage: Trained on the entire dataset with a defined set of training configurations.
  • Finetuning - Second Stage: Also trained on the entire dataset with some variations in settings.
  • Adjustment Stage: Focused on aesthetic adjustments to improve the overall visual quality.

The final model, Momo XL, was released by merging the Text Encoder from the Finetuning Second Stage with the UNet from the Adjustment Stage.

Hyperparameters

Stage Epochs UNet lr Text Encoder lr Batch Size Resolution Noise Offset Optimizer LR Scheduler
Finetuning 1st Stage 10 2e-5 1e-5 256 1024² N/A AdamW8bit Constant
Finetuning 2nd Stage 10 2e-5 1e-5 256 Max. 1280² N/A AdamW Constant
Adjustment Stage 0.25 8e-5 4e-5 1024 Max. 1280² 0.05 AdamW Constant