Mozilla
/

granite-34b-code-instruct-llamafile

Text Generation

Inference Endpoints

Model card Files Files and versions Community

jartine commited on May 26

Commit

ade73f1

•

1 Parent(s): de1686b

Update README.md

Files changed (1) hide show

README.md +9 -0

README.md CHANGED Viewed

@@ -268,6 +268,15 @@ context size to be available with llamafile for any given model, you can
 pass the `-c 0` flag. The default temperature for these llamafiles is 0.
 It can be changed, e.g. `--temp 0.8`.
 ## About Quantization
 Our own evaluation of this model leads us to believe that it works best

 pass the `-c 0` flag. The default temperature for these llamafiles is 0.
 It can be changed, e.g. `--temp 0.8`.
+## Benchmarks
+|                                   cpu\_info |                           model\_filename |       size |          test |             t/s |
+| -----------------------------------------: | ---------------------------------------: | ---------: | ------------: | --------------: |
+| AMD Ryzen Threadripper PRO 7995WX (znver4) |           granite-34b-code-instruct.Q8\_0 |  33.82 GiB |         pp512 |           94.34 |
+| AMD Ryzen Threadripper PRO 7995WX (znver4) |           granite-34b-code-instruct.Q8\_0 |  33.82 GiB |          tg16 |            5.61 |
+| AMD Ryzen Threadripper PRO 7995WX (znver4) |           granite-34b-code-instruct.Q5\_0 |  22.03 GiB |         pp512 |           95.08 |
+| AMD Ryzen Threadripper PRO 7995WX (znver4) |           granite-34b-code-instruct.Q5\_0 |  22.03 GiB |          tg16 |            7.78 |
 ## About Quantization
 Our own evaluation of this model leads us to believe that it works best