Edit model card

cubby

This is a passthrough of arco with an experimental model. It improved on arc challenge, only missing 1.2 points to get to the level of modern 3b baseline performance.

If you prefer answering multilingual, general knowledge, trivially simple questions chose qwen or llama. If you prefer solving trivially simple english tasks while being half the size, chose arco.

prompt

there is no prompt intentionally set.

benchmarks

zero-shot results from state-of-the-art small language models

Parameters Model MMLU ARC-C HellaSwag PIQA Winogrande Average
0.5b qwen 2 44.13 28.92 49.05 69.31 56.99 49.68
0.3b smollm2 25.52 37.71 56.41 71.93 59.27 50.17
0.5b qwen 2.5 47.29 31.83 52.17 70.29 57.06 51.72
0.5b arco 26.17 37.29 62.88 74.37 62.27 52.60
0.5b arco 2 25.51 38.82 63.02 74.70 61.25 52.66
1.24b llama 3.2 36.75 36.18 63.70 74.54 60.54 54.34

supporters

Buy Me A Coffee

trivia

arco also means "arc optimized" hence the focus on this cognitive-based benchmark.

Downloads last month
148
Safetensors
Model size
514M params
Tensor type
FP16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for appvoid/arco-2

Finetunes
9 models
Merges
1 model
Quantizations
3 models

Space using appvoid/arco-2 1

Collections including appvoid/arco-2