appvoid/arco-2 · Hugging Face

This is a passthrough of arco with an experimental model. It improved on arc challenge, only missing 1.2 points to get to the level of modern 3b baseline performance.
If you prefer answering multilingual, general knowledge, trivially simple questions chose qwen or llama. If you prefer solving trivially simple english tasks while being half the size, chose arco.

	
		
	
	
		prompt
	
there is no prompt intentionally set.

	
		
	
	
		benchmarks
	
zero-shot results from state-of-the-art small language models

	
		
Parameters
Model
MMLU
ARC-C
HellaSwag
PIQA
Winogrande
Average

0.5b
qwen 2
44.13
28.92
49.05
69.31
56.99
49.68

0.3b
smollm2
25.52
37.71
56.41
71.93
59.27
50.17

0.5b
qwen 2.5
47.29
31.83
52.17
70.29
57.06
51.72

0.5b
arco
26.17
37.29
62.88
74.37
62.27
52.60

0.5b
arco 2
25.51
38.82
63.02
74.70
61.25
52.66

1.24b
llama 3.2
36.75
36.18
63.70
74.54
60.54
54.34

	

	
		
	
	
		supporters
	

	
		
	
	
		trivia
	
arco also means "arc optimized" hence the focus on this cognitive-based benchmark.

appvoid
/

arco-2

prompt

benchmarks

supporters

trivia

Model tree for appvoid/arco-2

Space using appvoid/arco-2 1

Collections including appvoid/arco-2

main releases

favorite models

Parameters	Model	MMLU	ARC-C	HellaSwag	PIQA	Winogrande	Average
0.5b	qwen 2	44.13	28.92	49.05	69.31	56.99	49.68
0.3b	smollm2	25.52	37.71	56.41	71.93	59.27	50.17
0.5b	qwen 2.5	47.29	31.83	52.17	70.29	57.06	51.72
0.5b	arco	26.17	37.29	62.88	74.37	62.27	52.60
0.5b	arco 2	25.51	38.82	63.02	74.70	61.25	52.66
1.24b	llama 3.2	36.75	36.18	63.70	74.54	60.54	54.34