jartine commited on
Commit
710ddd7
1 Parent(s): a79b4b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md CHANGED
@@ -54,6 +54,34 @@ Having **trouble?** See the ["Gotchas"
54
  section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas-and-troubleshooting)
55
  of the llamafile README.
56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  ## Documentation
58
 
59
  See the [whisperfile
 
54
  section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas-and-troubleshooting)
55
  of the llamafile README.
56
 
57
+ ## GPU Acceleration
58
+
59
+ The following flags are available to enable GPU support:
60
+
61
+ - `--gpu nvidia`
62
+ - `--gpu metal`
63
+ - `--gpu amd`
64
+
65
+ The medium and large whisperfiles contain prebuilt dynamic shared
66
+ objects for Linux and Windows. If you download one of the other models,
67
+ then you'll need to install the CUDA or ROCm SDK and pass `--recompile`
68
+ to build a GGML CUDA module for your system.
69
+
70
+ On Windows, only the graphics card driver needs to be installed if you
71
+ own an NVIDIA GPU. On Windows, if you have an AMD GPU, you should
72
+ install the ROCm SDK v6.1 and then pass the flags `--recompile --gpu
73
+ amd` the first time you run your llamafile.
74
+
75
+ On NVIDIA GPUs, by default, the prebuilt tinyBLAS library is used to
76
+ perform matrix multiplications. This is open source software, but it
77
+ doesn't go as fast as closed source cuBLAS. If you have the CUDA SDK
78
+ installed on your system, then you can pass the `--recompile` flag to
79
+ build a GGML CUDA library just for your system that uses cuBLAS. This
80
+ ensures you get maximum performance.
81
+
82
+ For further information, please see the [llamafile
83
+ README](https://github.com/mozilla-ocho/llamafile/).
84
+
85
  ## Documentation
86
 
87
  See the [whisperfile