@singhsidhukuldeep on Hugging Face: "🎉 A new LLM is launched! 🚀 After checking if it's open-source or not, 🤔…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

singhsidhukuldeep

posted an update May 16

Post

1318

🎉 A new LLM is launched! 🚀
After checking if it's open-source or not, 🤔
you rush to see the benchmarks... 🏃‍♂️💨

Which benchmark does everyone check first? 🔍

MMLU (Massive Multitask Language Understanding)? 📚

Benchmarks like MMLU reaching saturation... most of the time the performance does not translate to real-world use cases! 🌐❗

Meet MMLU-Pro, released by TIGER-Lab on @huggingface ! 🐯🌍

🧪 12,217 questions across biology, business, chemistry, computer science, economics, engineering, health, history, law, mathematics, philosophy, physics, and psychology carefully validated by humans 🧑‍🔬

🔟 Goes to 10 options per question instead of 4, this increase in options will make the evaluation more realistic and reduce random guessing 🎯

📊 56% of questions come from MMLU, 34% from STEM websites, and the rest from TheoremQA and SciBench 📈

🤖 LLMs with weak chain-of-thought reasoning tend to perform lower, indicating it is more challenging and representative of real-world expectations 🧠💡

Any guess who tops it and who bombs it? 🤔📉📈

GPT-4o drops by 17% (from 0.887 to 0.7149) 📉
Llama-3-70B drops by 27% (from 0.820 to 0.5541) 📉

🔗 TIGER-Lab/MMLU-Pro

clem

May 16

very cool! cc @clefourrier

Zachyyypoo

May 16

Great post!

In this post