asahi417 commited on
Commit
828f80d
1 Parent(s): b1efc0d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -1
README.md CHANGED
@@ -32,7 +32,26 @@ for segment in segments:
32
  ```
33
 
34
  ### Benchmark
35
- Please refer to the [kotoba-tech/kotoba-whisper-v1.0-faster](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-faster) for the detail of speed up [here](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0-faster#benchmark).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
 
38
 
 
32
  ```
33
 
34
  ### Benchmark
35
+ We measure the inference speed of different kotoba-whisper-v2.0 implementations with four different Japanese speech audio on MacBook Pro with the following spec:
36
+ - Apple M2 Pro
37
+ - 32GB
38
+ - 14-inch, 2023
39
+ - OS Sonoma Version 14.4.1 (23E224)
40
+
41
+ | audio file | audio duration (min)| [whisper.cpp](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0-ggml) (sec) | [faster-whisper](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0-faster) (sec)| [hf pipeline](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0) (sec)
42
+ |--------|------|-----|------|-----|
43
+ |audio 1 | 50.3 | 581 | 2601 | 807 |
44
+ |audio 2 | 5.6 | 41 | 73 | 61 |
45
+ |audio 3 | 4.9 | 30 | 141 | 54 |
46
+ |audio 4 | 5.6 | 35 | 126 | 69 |
47
+
48
+ Scripts to re-run the experiment can be found bellow:
49
+ * [whisper.cpp](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0-ggml/blob/main/benchmark.sh)
50
+ * [faster-whisper](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0-faster/blob/main/benchmark.sh)
51
+ * [hf pipeline](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0/blob/main/benchmark.sh)
52
+ Also, currently whisper.cpp and faster-whisper support the [sequential long-form decoding](https://huggingface.co/distil-whisper/distil-large-v3#sequential-long-form),
53
+ and only Huggingface pipeline supports the [chunked long-form decoding](https://huggingface.co/distil-whisper/distil-large-v3#chunked-long-form), which we empirically
54
+ found better than the sequnential long-form decoding.
55
 
56
 
57