PotatoBox commited on
Commit
bc25d31
1 Parent(s): 80f03ba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -1
README.md CHANGED
@@ -41,7 +41,7 @@ license: mit
41
  <img src="./card_images/11.png" class="wide" alt="Sample Image 11">
42
  </div>
43
 
44
- **Momo XL** is an anime-style model based on SDXL, fine-tuned to produce high-quality anime-style images with detailed and vibrant aesthetics.
45
 
46
  ## Key Features:
47
 
@@ -66,3 +66,35 @@ This model may produce unexpected or unintended results. **Use with caution and
66
  - **Data Sources**: The model was trained on publicly available datasets. While efforts have been made to filter and curate the training data, some undesirable content may remain.
67
 
68
  Thank you! 😊
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  <img src="./card_images/11.png" class="wide" alt="Sample Image 11">
42
  </div>
43
 
44
+ **Momo XL** is an anime-style model based on SDXL, fine-tuned to produce high-quality anime-style images with detailed and vibrant aesthetics. (Oct 6, 2024)
45
 
46
  ## Key Features:
47
 
 
66
  - **Data Sources**: The model was trained on publicly available datasets. While efforts have been made to filter and curate the training data, some undesirable content may remain.
67
 
68
  Thank you! 😊
69
+
70
+
71
+ ------------------------------------------------------
72
+ ## Momo XL - Training Details (Oct 15, 2024)
73
+
74
+ ### Dataset
75
+ Momo XL was trained using a dataset of over **400,000+ images** sourced from Danbooru.
76
+
77
+ ### Base Model
78
+ Momo XL was built on top of SDXL, incorporating knowledge from two finetuned models:
79
+ - Formula:
80
+ `SDXL_base + (Animagine 3.0 base - SDXL_base) * 1.0 + (Pony V6 - SDXL_base) * 0.5`
81
+
82
+ For more details:
83
+ - [Animagine 3.0 base](https://huggingface.co/Linaqruf/animagine-xl-3.0)
84
+ - [Pony V6](https://huggingface.co/LyliaEngine/Pony_Diffusion_V6_XL)
85
+
86
+ ### Training Process
87
+ Training was conducted on **A100 80GB GPUs**, totaling over **2000+ GPU hours**. The training was divided into three stages:
88
+ - **Finetuning - First Stage**: Trained on the entire dataset with a defined set of training configurations.
89
+ - **Finetuning - Second Stage**: Also trained on the entire dataset with some variations in settings.
90
+ - **Adjustment Stage**: Focused on aesthetic adjustments to improve the overall visual quality.
91
+
92
+ The final model, **Momo XL**, was released by merging the Text Encoder from the Finetuning Second Stage with the UNet from the Adjustment Stage.
93
+
94
+ ### Hyperparameters
95
+
96
+ | Stage | Epochs | UNet lr | Text Encoder lr | Batch Size | Resolution | Noise Offset | Optimizer | LR Scheduler |
97
+ |--------------------------|--------|---------|-----------------|------------|------------|--------------|------------|--------------|
98
+ | **Finetuning 1st Stage** | 10 | 2e-5 | 1e-5 | 256 | 1024² | N/A | AdamW8bit | Constant |
99
+ | **Finetuning 2nd Stage** | 10 | 2e-5 | 1e-5 | 256 | Max. 1280² | N/A | AdamW | Constant |
100
+ | **Adjustment Stage** | 0.25 | 8e-5 | 4e-5 | 1024 | Max. 1280² | 0.05 | AdamW | Constant |