Issue: Unable to transcribe raven_poe_64kb.mp3

#2
by IAmTheCollector - opened

Transcribe of "raven_poe_64kb.wav" works normally.

Windows 10 Pro
Version 22H2
OS build 19045.4170

Cuda 11.8
Cuda added to env
Cuda bin added to path

All files downloaded from this repo.
Tested on whisperfiles version tiny.en, small.en, medium.en - same error on all versions

Whisperfiles renamed to add .exe
Whisperfiles "unblocked" in file properties
Whisperfiles are on non system drive/directory
Whisperfiles are run via windows shortcut, same args for all tested versions:
whisper-tiny.en.llamafile.exe --port 55556 --gpu nvidia

Inference run from web ui with default settings
Chosen audio file: raven_poe_64kb.mp3
Inference result:
{"error":"failed to read WAV file"}

Running as administrator didn't help.

Cmd output:

whisper_init_from_file_with_params_no_state: loading model from '/zip/ggml-tiny.en.bin'
import_cuda_impl: initializing gpu module...
link_cuda_dso: note: dynamically linking /C/Users/Admin/.llamafile/v/0.8.13/ggml-cuda.dll
ggml_cuda_link: welcome to CUDA SDK with tinyBLAS
link_cuda_dso: GPU support loaded
whisper_init_with_params_no_state: cuda gpu = 1
whisper_init_with_params_no_state: metal gpu = 0
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 384
whisper_model_load: n_text_head = 6
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 1 (tiny)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 2070, compute capability 7.5, VMM: yes
whisper_model_load: CUDA0 total size = 77.11 MB
whisper_model_load: model size = 77.11 MB
whisper_backend_init_gpu: using CUDA backend
whisper_init_state: kv self size = 9.44 MB
whisper_init_state: kv cross size = 9.44 MB
whisper_init_state: kv pad size = 2.36 MB
whisper_init_state: compute buffer (conv) = 13.45 MB
whisper_init_state: compute buffer (encode) = 85.79 MB
whisper_init_state: compute buffer (cross) = 4.14 MB
whisper_init_state: compute buffer (decode) = 98.22 MB

whisper server listening at http://127.0.0.1:55556

Received request: raven_poe_64kb.mp3
P: converting to wav...
P: failed to open audio file: Invalid data (we support .wav, .mp3, .flac, and .ogg)
error: failed to read WAV file

You have to send a 16khz wav to the server right now. Support for converting .mp3 and friends is only available to the CLI right now. Support in the server is coming soon!

jartine changed discussion status to closed

Sign up or log in to comment