YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
DiffSinger (OpenVPI maintained version)
This is a refactored and enhanced version of DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism based on the original paper and implementation, which provides:
- Cleaner code structure: useless and redundant files are removed and the others are re-organized.
- Better sound quality: the sampling rate of synthesized audio are adapted to 44.1 kHz instead of the original 24 kHz.
- Higher fidelity: improved acoustic models and diffusion sampling acceleration algorithms are integrated.
- More controllability: introduced variance models and parameters for prediction and control of pitch, energy, breathiness, etc.
- Production compatibility: functionalities are designed to match the requirements of production deployment and the SVS communities.
Overview | Variance Model | Acoustic Model |
---|---|---|
User Guidance
- Installation & basic usages: See Getting Started
- Dataset creation pipelines & tools: See MakeDiffSinger
- Best practices & tutorials: See Best Practices
- Editing configurations: See Configuration Schemas
- Deployment & production: OpenUTAU for DiffSinger, DiffScope (under development)
- Communication groups: QQ Group (907879266), Discord server
Progress & Roadmap
- Progress since we forked into this repository: See Releases
- Roadmap for future releases: See Project Board
- Thoughts, proposals & ideas: See Discussions
Architecture & Algorithms
TBD
Development Resources
TBD
References
Original Paper & Implementation
- Paper: DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
- Implementation: MoonInTheRiver/DiffSinger
Generative Models & Algorithms
- Denoising Diffusion Probabilistic Models (DDPM): paper, implementation
- DDIM for diffusion sampling acceleration
- PNDM for diffusion sampling acceleration
- DPM-Solver++ for diffusion sampling acceleration
- UniPC for diffusion sampling acceleration
- Rectified Flow (RF): paper, implementation
Dependencies & Submodules
- HiFi-GAN and NSF for waveform reconstruction
- pc-ddsp for waveform reconstruction
- RMVPE and yxlllc's fork for pitch extraction
- Vocal Remover and yxlllc's fork for harmonic-noise separation
Disclaimer
Any organization or individual is prohibited from using any functionalities included in this repository to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.
License
This forked DiffSinger repository is licensed under the Apache 2.0 License.