TheBloke commited on
Commit
74aa58b
1 Parent(s): ad83b63

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -22
README.md CHANGED
@@ -118,37 +118,23 @@ Refer to the Provided Files table below to see what files use which methods, and
118
 
119
  **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
120
 
121
- ### Q6_K and Q8_0 files are split and require joining
122
 
123
- **Note:** HF does not support uploading files larger than 50GB. Therefore I have uploaded the Q6_K and Q8_0 files as split files.
124
 
125
  <details>
126
- <summary>Click for instructions regarding Q6_K and Q8_0 files</summary>
127
 
128
- ### q6_K
129
- Please download:
130
- * `falcon-180b-chat.Q6_K.gguf-split-a`
131
- * `falcon-180b-chat.Q6_K.gguf-split-b`
132
-
133
- ### q8_0
134
- Please download:
135
- * `falcon-180b-chat.Q8_0.gguf-split-a`
136
- * `falcon-180b-chat.Q8_0.gguf-split-b`
137
-
138
- To join the files, do the following:
139
 
140
  Linux and macOS:
141
  ```
142
- cat falcon-180b-chat.Q6_K.gguf-split-* > falcon-180b-chat.Q6_K.gguf && rm falcon-180b-chat.Q6_K.gguf-split-*
143
- cat falcon-180b-chat.Q8_0.gguf-split-* > falcon-180b-chat.Q8_0.gguf && rm falcon-180b-chat.Q8_0.gguf-split-*
144
  ```
145
  Windows command line:
146
  ```
147
- COPY /B falcon-180b-chat.Q6_K.gguf-split-a + falcon-180b-chat.Q6_K.gguf-split-b falcon-180b-chat.Q6_K.gguf
148
- del falcon-180b-chat.Q6_K.gguf-split-a falcon-180b-chat.Q6_K.gguf-split-b
149
-
150
- COPY /B falcon-180b-chat.Q8_0.gguf-split-a + falcon-180b-chat.Q8_0.gguf-split-b falcon-180b-chat.Q8_0.gguf
151
- del falcon-180b-chat.Q8_0.gguf-split-a falcon-180b-chat.Q8_0.gguf-split-b
152
  ```
153
 
154
  </details>
@@ -162,7 +148,7 @@ Make sure you are using `llama.cpp` from commit [6381d4e110bd0ec02843a60bbeb8b6f
162
  For compatibility with older versions of llama.cpp, or for any third-party libraries or clients that haven't yet updated for GGUF, please use GGML files instead.
163
 
164
  ```
165
- ./main -t 10 -ngl 32 -m falcon-180b-chat.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "User: {prompt}\nAssistant:"
166
  ```
167
  Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`. If offloading all layers to GPU, set `-t 1`.
168
 
 
118
 
119
  **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
120
 
121
+ ### All files are split and require joining after download
122
 
123
+ **Note:** HF does not support uploading files larger than 50GB. Therefore I have uploaded all files as split files
124
 
125
  <details>
126
+ <summary>Click for instructions regarding joining files</summary>
127
 
128
+ To join the files, use the following example for each file you're interested in:
 
 
 
 
 
 
 
 
 
 
129
 
130
  Linux and macOS:
131
  ```
132
+ cat falcon-180b-chat.Q2_K.gguf-split-* > falcon-180b-chat.Q2_K.gguf && rm falcon-180b-chat.Q2_K.gguf-split-*
 
133
  ```
134
  Windows command line:
135
  ```
136
+ COPY /B falcon-180b-chat.Q2_K.gguf-split-a + falcon-180b-chat.Q2_K.gguf-split-b falcon-180b-chat.Q2_K.gguf
137
+ del falcon-180b-chat.Q2_K.gguf-split-a falcon-180b-chat.Q2_K.gguf-split-b
 
 
 
138
  ```
139
 
140
  </details>
 
148
  For compatibility with older versions of llama.cpp, or for any third-party libraries or clients that haven't yet updated for GGUF, please use GGML files instead.
149
 
150
  ```
151
+ ./main -t 10 -ngl 32 -m falcon-180b-chat.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "User: Write a story about llamas\nAssistant:"
152
  ```
153
  Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`. If offloading all layers to GPU, set `-t 1`.
154