Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2312.00886

alignment_24_best

KTO: Model Alignment as Prospect Theoretic Optimization

Paper • 2402.01306 • Published Feb 2 • 15
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 48
SimPO: Simple Preference Optimization with a Reference-Free Reward

Paper • 2405.14734 • Published May 23 • 10
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment

Paper • 2408.06266 • Published Aug 12 • 9

aMUSEd: An Open MUSE Reproduction

Paper • 2401.01808 • Published Jan 3 • 28
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3 • 27
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

Paper • 2401.00604 • Published Dec 31, 2023 • 4
LARP: Language-Agent Role Play for Open-World Games

Paper • 2312.17653 • Published Dec 24, 2023 • 30

Nash Learning from Human Feedback

Paper • 2312.00886 • Published Dec 1, 2023 • 14

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Paper • 2311.13231 • Published Nov 22, 2023 • 26
Nash Learning from Human Feedback

Paper • 2312.00886 • Published Dec 1, 2023 • 14
Secrets of RLHF in Large Language Models Part II: Reward Modeling

Paper • 2401.06080 • Published Jan 11 • 26
MusicRL: Aligning Music Generation to Human Preferences

Paper • 2402.04229 • Published Feb 6 • 16

Super Alignment

Trusted Source Alignment in Large Language Models

Paper • 2311.06697 • Published Nov 12, 2023 • 10
Diffusion Model Alignment Using Direct Preference Optimization

Paper • 2311.12908 • Published Nov 21, 2023 • 47
SuperHF: Supervised Iterative Learning from Human Feedback

Paper • 2310.16763 • Published Oct 25, 2023 • 1
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning

Paper • 2311.15657 • Published Nov 27, 2023 • 2

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs