Peng Wang's picture

13 45

Peng Wang

stillarrow

·

AI & ML interests

None yet

Organizations

None yet

stillarrow's activity

upvoted a paper 3 months ago

Llemma: An Open Language Model For Mathematics

Paper • 2310.10631 • Published Oct 16, 2023 • 50

upvoted 3 papers 4 months ago

DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning

Paper • 2407.04078 • Published Jul 4 • 16

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published Jul 1 • 75

LiteSearch: Efficacious Tree Search for LLM

Paper • 2407.00320 • Published Jun 29 • 37

upvoted a collection 6 months ago

Reward models on the hub

UNMAINTAINED: See RewardBench... A place to collect reward models, an often not released artifact of RLHF. • 18 items • Updated Apr 13 • 25

upvoted 2 papers 7 months ago

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 62

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

Paper • 2312.01552 • Published Dec 4, 2023 • 30

upvoted 3 papers 9 months ago

LiPO: Listwise Preference Optimization through Learning-to-Rank

Paper • 2402.01878 • Published Feb 2 • 19

ODIN: Disentangled Reward Mitigates Hacking in RLHF

Paper • 2402.07319 • Published Feb 11 • 13

AgentTuning: Enabling Generalized Agent Abilities for LLMs

Paper • 2310.12823 • Published Oct 19, 2023 • 35

upvoted a collection 9 months ago

Awesome feedback datasets

A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. • 19 items • Updated Apr 12 • 65

upvoted a paper 10 months ago

Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 242

upvoted a paper 12 months ago

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Paper • 2309.03883 • Published Sep 7, 2023 • 33