8 bit version

#3
by bullerwins - opened

Hi!

In my benchmarks seems like smaller parameter models suffer more from quantization, would it be possible to upload the 8 bit version as GPTQ also supports it?

These are the results from my MMLU-pro tests:

Llama3.1-8B-Instruct

overall biology business chemistry computer science economics engineering health history law math philosophy physics psychology other
48.28 65.41 56.27 40.37 49.51 59.72 32.61 56.60 43.83 33.79 50.11 44.49 42.88 62.66 49.57

Llama3.1-8B-Instruct-GPTQ-INT4

overall biology business chemistry computer science economics engineering health history law math philosophy physics psychology other
39.52 57.74 42.08 31.36 42.20 49.53 25.39 47.43 36.22 28.25 37.68 36.67 34.64 55.01 43.18

Hi @bullerwins ,

Could you please share how you run these benchmarks? Is there publicly available code to run the MMLU/MMLU-Pro benchmarks? I would like to test a quantized version of Llama-3.1-8B that I created. Thank you for your time!

I'm using https://github.com/chigkim/Ollama-MMLU-Pro pointing it to the openai api endpoint of vLLM

Thank you for the information!

Sign up or log in to comment