Update README.md
Browse files
README.md
CHANGED
@@ -54,6 +54,34 @@ Having **trouble?** See the ["Gotchas"
|
|
54 |
section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas-and-troubleshooting)
|
55 |
of the llamafile README.
|
56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
## Documentation
|
58 |
|
59 |
See the [whisperfile
|
|
|
54 |
section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas-and-troubleshooting)
|
55 |
of the llamafile README.
|
56 |
|
57 |
+
## GPU Acceleration
|
58 |
+
|
59 |
+
The following flags are available to enable GPU support:
|
60 |
+
|
61 |
+
- `--gpu nvidia`
|
62 |
+
- `--gpu metal`
|
63 |
+
- `--gpu amd`
|
64 |
+
|
65 |
+
The medium and large whisperfiles contain prebuilt dynamic shared
|
66 |
+
objects for Linux and Windows. If you download one of the other models,
|
67 |
+
then you'll need to install the CUDA or ROCm SDK and pass `--recompile`
|
68 |
+
to build a GGML CUDA module for your system.
|
69 |
+
|
70 |
+
On Windows, only the graphics card driver needs to be installed if you
|
71 |
+
own an NVIDIA GPU. On Windows, if you have an AMD GPU, you should
|
72 |
+
install the ROCm SDK v6.1 and then pass the flags `--recompile --gpu
|
73 |
+
amd` the first time you run your llamafile.
|
74 |
+
|
75 |
+
On NVIDIA GPUs, by default, the prebuilt tinyBLAS library is used to
|
76 |
+
perform matrix multiplications. This is open source software, but it
|
77 |
+
doesn't go as fast as closed source cuBLAS. If you have the CUDA SDK
|
78 |
+
installed on your system, then you can pass the `--recompile` flag to
|
79 |
+
build a GGML CUDA library just for your system that uses cuBLAS. This
|
80 |
+
ensures you get maximum performance.
|
81 |
+
|
82 |
+
For further information, please see the [llamafile
|
83 |
+
README](https://github.com/mozilla-ocho/llamafile/).
|
84 |
+
|
85 |
## Documentation
|
86 |
|
87 |
See the [whisperfile
|