Mozilla
/

whisperfile

Model card Files Files and versions Community

jartine commited on Oct 2

Commit

710ddd7

•

1 Parent(s): a79b4b9

Update README.md

Files changed (1) hide show

README.md +28 -0

README.md CHANGED Viewed

@@ -54,6 +54,34 @@ Having **trouble?** See the ["Gotchas"
 section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas-and-troubleshooting)
 of the llamafile README.
 ## Documentation
 See the [whisperfile

 section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas-and-troubleshooting)
 of the llamafile README.
+## GPU Acceleration
+The following flags are available to enable GPU support:
+- `--gpu nvidia`
+- `--gpu metal`
+- `--gpu amd`
+The medium and large whisperfiles contain prebuilt dynamic shared
+objects for Linux and Windows. If you download one of the other models,
+then you'll need to install the CUDA or ROCm SDK and pass `--recompile`
+to build a GGML CUDA module for your system.
+On Windows, only the graphics card driver needs to be installed if you
+own an NVIDIA GPU. On Windows, if you have an AMD GPU, you should
+install the ROCm SDK v6.1 and then pass the flags `--recompile --gpu
+amd` the first time you run your llamafile.
+On NVIDIA GPUs, by default, the prebuilt tinyBLAS library is used to
+perform matrix multiplications. This is open source software, but it
+doesn't go as fast as closed source cuBLAS. If you have the CUDA SDK
+installed on your system, then you can pass the `--recompile` flag to
+build a GGML CUDA library just for your system that uses cuBLAS. This
+ensures you get maximum performance.
+For further information, please see the [llamafile
+README](https://github.com/mozilla-ocho/llamafile/).
 ## Documentation
 See the [whisperfile