The benchmark scores are completely contradictory and need to be verified.

#16
by JesusCrist - opened

The evaluation benchmark are exactly the same, but the scores are completely different.

In https://huggingface.co/google/gemma-2b

image.png

In https://huggingface.co/google/gemma-1.1-2b-it
image.png

Sign up or log in to comment