Update readme
Browse files
README.md
CHANGED
@@ -36,7 +36,32 @@ Based on [ImprovedDiffusion by openai](https://github.com/openai/improved-diffus
|
|
36 |
|
37 |
## How do I use this?
|
38 |
|
39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
## How do I train this?
|
42 |
|
@@ -44,4 +69,10 @@ Frankly - you don't. Building this model has been a labor of love for me, consum
|
|
44 |
resources for the better part of 6 months. It uses a dataset I've gathered, refined and transcribed that consists of
|
45 |
a lot of audio data which I cannot distribute because of copywrite or no open licenses.
|
46 |
|
47 |
-
With that said, I'm willing to help you out if you really want to give it a shot. DM me.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
|
37 |
## How do I use this?
|
38 |
|
39 |
+
Check out the colab: https://colab.research.google.com/drive/1wVVqUPqwiDBUVeWWOUNglpGhU3hg_cbR?usp=sharing
|
40 |
+
|
41 |
+
Or on a computer with a GPU (with >=16GB of VRAM):
|
42 |
+
```shell
|
43 |
+
git clone https://github.com/neonbjb/tortoise-tts.git
|
44 |
+
cd tortoise-tts
|
45 |
+
pip install -r requirements.txt
|
46 |
+
python do_tts.py
|
47 |
+
```
|
48 |
+
|
49 |
+
## Hand-picked TTS samples
|
50 |
+
|
51 |
+
I generated ~250 samples from 23 text prompts and 8 voices. The text prompts have never been seen by the model. The
|
52 |
+
voices were pulled from the training set.
|
53 |
+
|
54 |
+
All of the samples can be found in the results/ folder of this repo.
|
55 |
+
|
56 |
+
I handpicked a few to show what the model is capable of:
|
57 |
+
[Atkins - Road not taken](results/favorites/atkins_road_not_taken.wav)
|
58 |
+
[Dotrice - Rolling Stone interview](results/favorites/dotrice_rollingstone.wav)
|
59 |
+
[Dotrice - 'Ornaments' from tacotron test set](results/favorites/dotrice_tacotron_samp1.wav)
|
60 |
+
[Kennard - 'Acute emotional intelligence' from tacotron test set](results/favorites/kennard_tacotron_samp2.wav)
|
61 |
+
[Mol - Because I could not stop for death](results/favorites/mol_dickenson.wav)
|
62 |
+
[Mol - Obama](results/favorites/mol_obama.wav)
|
63 |
+
|
64 |
+
Prosody is remarkably good for poetry, despite the fact that it was never trained on poetry.
|
65 |
|
66 |
## How do I train this?
|
67 |
|
|
|
69 |
resources for the better part of 6 months. It uses a dataset I've gathered, refined and transcribed that consists of
|
70 |
a lot of audio data which I cannot distribute because of copywrite or no open licenses.
|
71 |
|
72 |
+
With that said, I'm willing to help you out if you really want to give it a shot. DM me.
|
73 |
+
|
74 |
+
## Looking forward
|
75 |
+
|
76 |
+
I'm not satisfied with this yet. Treat this as a "sneak peek" and check back in a couple of months. I think the concept
|
77 |
+
is sound, but there are a few hurdles to overcome to get sample quality up. I have been doing major tweaks to the
|
78 |
+
diffusion model and should have something new and much better soon.
|