Finnish-NLP
/

wav2vec2-base-fi-voxpopuli-v2-finetuned

@@ -29,6 +29,20 @@ model-index:
     - name: Test CER
       type: cer
       value: 1.40
 ---
 # Wav2Vec2-base-fi-voxpopuli-v2 for Finnish ASR
@@ -150,7 +164,9 @@ The pretrained `facebook/wav2vec2-base-fi-voxpopuli-v2` model was initialized wi
 ## Evaluation results
-Evaluation was done with the [Common Voice 7.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0) and with the [Common Voice 9.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_9_0). This model's training data includes the training splits of Common Voice 9.0 but our previous models include the Common Voice 7.0 so we ran tests for both versions. Note: Common Voice doesn't seem to fully preserve the test split as fixed between the dataset versions so it is possible that some of the training examples of Common Voice 9.0 are in the test split of the Common Voice 7.0 and vice versa. Thus, test result comparisons are not fully accurate between the models trained with different Common Voice versions but the comparison should still be meaningful enough.
 ### Common Voice 7.0 testing
@@ -160,14 +176,15 @@ To evaluate this model, run the `eval.py` script in this repository:
 python3 eval.py --model_id Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned --dataset mozilla-foundation/common_voice_7_0 --config fi --split test
 ```
-This model (the third row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
-|                                                    | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
-|----------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
-|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2          | 1000 million     |**4.09**       |**9.73**          |**0.88**       |**1.65**          |
-|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm             | 1000 million     |5.65           |13.11             |1.20           |2.23              |
-|Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million       |5.85           |13.52             |1.35           |2.44              |
-|Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm           | 300 million      |8.16           |17.92             |1.97           |3.36              |
 ### Common Voice 9.0 testing
@@ -177,14 +194,33 @@ To evaluate this model, run the `eval.py` script in this repository:
 python3 eval.py --model_id Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned --dataset mozilla-foundation/common_voice_9_0 --config fi --split test
 ```
-This model (the third row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
-|                                                    | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
-|----------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
-|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2          | 1000 million     |**3.72**       |**8.96**          |**0.80**       |**1.52**          |
-|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm             | 1000 million     |5.35           |13.00             |1.14           |2.20              |
-|Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned | 95 million       |5.93           |14.08             |1.40           |2.59              |
-|Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm           | 300 million      |7.42           |16.45             |1.79           |3.07              |
 ## Team Members

     - name: Test CER
       type: cer
       value: 1.40
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: FLEURS ASR
+      type: google/fleurs
+      args: fi_fi
+    metrics:
+    - name: Test WER
+      type: wer
+      value: 13.99
+    - name: Test CER
+      type: cer
+      value: 6.07
 ---
 # Wav2Vec2-base-fi-voxpopuli-v2 for Finnish ASR
 ## Evaluation results
+Evaluation was done with the [Common Voice 7.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0), [Common Voice 9.0 Finnish test split](https://huggingface.co/datasets/mozilla-foundation/common_voice_9_0) and with the [FLEURS ASR Finnish test split](https://huggingface.co/datasets/google/fleurs).
+This model's training data includes the training splits of Common Voice 9.0 but most of our previous models include the Common Voice 7.0 so we ran tests for both Common Voice versions. Note: Common Voice doesn't seem to fully preserve the test split as fixed between the dataset versions so it is possible that some of the training examples of Common Voice 9.0 are in the test split of the Common Voice 7.0 and vice versa. Thus, Common Voice test result comparisons are not fully accurate between the models trained with different Common Voice versions but the comparison should still be meaningful enough.
 ### Common Voice 7.0 testing
 python3 eval.py --model_id Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned --dataset mozilla-foundation/common_voice_7_0 --config fi --split test
 ```
+This model (the first row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
+|                                                       | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
+|-------------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
+|Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned    | 95 million       |5.85           |13.52             |1.35           |2.44              |
+|Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish | 300 million      |4.13           |**9.66**          |0.90           |1.66              |
+|Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm              | 300 million      |8.16           |17.92             |1.97           |3.36              |
+|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm                | 1000 million     |5.65           |13.11             |1.20           |2.23              |
+|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2             | 1000 million     |**4.09**       |9.73              |**0.88**       |**1.65**          |
 ### Common Voice 9.0 testing
 python3 eval.py --model_id Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned --dataset mozilla-foundation/common_voice_9_0 --config fi --split test
 ```
+This model (the first row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
+|                                                       | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
+|-------------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
+|Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned    | 95 million       |5.93           |14.08             |1.40           |2.59              |
+|Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish | 300 million      |4.13           |9.83              |0.92           |1.71              |
+|Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm              | 300 million      |7.42           |16.45             |1.79           |3.07              |
+|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm                | 1000 million     |5.35           |13.00             |1.14           |2.20              |
+|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2             | 1000 million     |**3.72**       |**8.96**          |**0.80**       |**1.52**          |
+### FLEURS ASR testing
+To evaluate this model, run the `eval.py` script in this repository:
+```bash
+python3 eval.py --model_id Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned --dataset google/fleurs --config fi_fi --split test
+```
+This model (the first row of the table) achieves the following WER (Word Error Rate) and CER (Character Error Rate) results compared to our other models and their parameter counts:
+|                                                       | Model parameters | WER (with LM) | WER (without LM) | CER (with LM) | CER (without LM) |
+|-------------------------------------------------------|------------------|---------------|------------------|---------------|------------------|
+|Finnish-NLP/wav2vec2-base-fi-voxpopuli-v2-finetuned    | 95 million       |13.99          |17.16             |6.07           |6.61              |
+|Finnish-NLP/wav2vec2-large-uralic-voxpopuli-v2-finnish | 300 million      |12.44          |**14.63**         |5.77           |6.22              |
+|Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm              | 300 million      |17.72          |23.30             |6.78           |7.67              |
+|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm                | 1000 million     |20.34          |16.67             |6.97           |6.35              |
+|Finnish-NLP/wav2vec2-xlsr-1b-finnish-lm-v2             | 1000 million     |**12.11**      |14.89             |**5.65**       |**6.06**          |
 ## Team Members