XLS-R-based CTC model with 5-gram language model from Common Voice
This model is a version of facebook/wav2vec2-xls-r-2b-22-to-16 fine-tuned mainly on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - NL dataset (see details below), on which a small 5-gram language model is added based on the Common Voice training corpus. This model achieves the following results on the evaluation set (of Common Voice 8.0):
- Wer: 0.0669
- Cer: 0.0197
Model description
The model takes 16kHz sound input, and uses a Wav2Vec2ForCTC decoder with 48 letters to output the final result.
To improve accuracy, a beam decoder is used; the beams are scored based on 5-gram language model trained on the Common Voice 8 corpus.
Intended uses & limitations
This model can be used to transcribe Dutch or Flemish spoken dutch to text (without punctuation).
Training and evaluation data
- The model was initialized with the 2B parameter model from Facebook.
- The model was then trained
2000
iterations (batch size 32) on thedutch
configuration of themultilingual_librispeech
dataset. - The model was then trained
2000
iterations (batch size 32) on thenl
configuration of thecommon_voice_8_0
dataset. - The model was then trained
6000
iterations (batch size 32) on thecgn
dataset. - The model was then trained
6000
iterations (batch size 32) on thenl
configuation of thecommon_voice_8_0
dataset.
Framework versions
- Transformers 4.17.0.dev0
- Pytorch 1.10.2+cu102
- Datasets 1.18.2.dev0
- Tokenizers 0.11.0
- Downloads last month
- 8
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Datasets used to train FremyCompany/xls-r-nl-v1-cv8-lm
Evaluation results
- Test WER on Common Voice 8self-reported6.690
- Test CER on Common Voice 8self-reported1.970
- Test WER on Robust Speech Event - Dev Dataself-reported20.790
- Test CER on Robust Speech Event - Dev Dataself-reported10.720
- Test WER on Robust Speech Event - Test Dataself-reported19.710