leaderboard-pr-bot's picture
Adding Evaluation Results
7ae14c2
|
raw
history blame
1.44 kB

Fresh Alpasta, done Al Dente!

It's da logical choice! Now with a similar personality emulation quality to GPT4-X-Alpasta-30b!

Model Info:

ChanSung's Alpaca-LoRA-30B-elina merged with Open Assistant's second Finetune

Benchmarks:

Wikitext2: 4.662261962890625

PTB: 24.547462463378906

C4: 7.05504846572876

4bit:

Wikitext2: 5.016242980957031

PTB: 25.576189041137695

C4: 7.332120418548584

~ Thanks to askmyteapot for performing these benchmarks!

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 57.05
ARC (25-shot) 60.58
HellaSwag (10-shot) 81.81
MMLU (5-shot) 56.63
TruthfulQA (0-shot) 48.38
Winogrande (5-shot) 78.14
GSM8K (5-shot) 26.76
DROP (3-shot) 47.06