I'm doing correlation analysis of different benchmarks to Human Arena Elo score.
You model is missing results for LLM Leaderboard benchmarks (most notably MMLU and ARC-C)
I'd appreciate it.
· Sign up or log in to comment