Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
RLHFlow
's Collections
RLHFlow MATH Process Reward Model
Standard-format-preference-dataset
Mixture-of-preference-reward-modeling
RM-Bradley-Terry
PM-pair
Online RLHF
RLHFLow Reward Models
SFT Models
RM-Bradley-Terry
updated
Apr 29
We train the reward model as the maximum likelihood estimation of the Bradley-Terry model.
Upvote
-
sfairXC/FsfairX-LLaMA3-RM-v0.1
Text Classification
•
Updated
27 days ago
•
15.2k
•
48
hendrydong/preference_700K
Viewer
•
Updated
Sep 28
•
700k
•
1.23k
•
6
weqweasdas/RM-Mistral-7B
Text Classification
•
Updated
Mar 31
•
288
•
22
weqweasdas/preference_dataset_mixture2_and_safe_pku
Viewer
•
Updated
Apr 29
•
555k
•
57
•
9
weqweasdas/RM-Gemma-2B
Text Classification
•
Updated
Mar 22
•
146
•
16
Upvote
-
Share collection
View history
Collection guide
Browse collections