DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning Paper • 2407.04078 • Published Jul 4 • 16
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published Jul 1 • 75
Reward models on the hub Collection UNMAINTAINED: See RewardBench... A place to collect reward models, an often not released artifact of RLHF. • 18 items • Updated Apr 13 • 25
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 62
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning Paper • 2312.01552 • Published Dec 4, 2023 • 30
LiPO: Listwise Preference Optimization through Learning-to-Rank Paper • 2402.01878 • Published Feb 2 • 19
AgentTuning: Enabling Generalized Agent Abilities for LLMs Paper • 2310.12823 • Published Oct 19, 2023 • 35
Awesome feedback datasets Collection A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. • 19 items • Updated Apr 12 • 65
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 242
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models Paper • 2309.03883 • Published Sep 7, 2023 • 33