We collect the open-source datasets and process them into the standard format.
AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
models
6
RLHFlow/ArmoRM-Llama3-8B-v0.1
Text Classification
•
Updated
•
45.5k
•
128
RLHFlow/LLaMA3-iterative-DPO-final
Text Generation
•
Updated
•
5.57k
•
42
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation
•
Updated
•
811
•
33
RLHFlow/LLaMA3-SFT
Text Generation
•
Updated
•
2.41k
•
7
RLHFlow/DPA-v1-Mistral-7B
Text Generation
•
Updated
•
23
•
1
RLHFlow/RewardModel-Mistral-7B-for-DPA-v1
Text Classification
•
Updated
•
551
•
1
datasets
42
RLHFlow/ultrafeedback_iter3
Viewer
•
Updated
•
19.6k
RLHFlow/ultrafeedback_iter2
Viewer
•
Updated
•
20k
RLHFlow/ultrafeedback_iter1
Viewer
•
Updated
•
20k
RLHFlow/pair-preference-Skywork-80K-v0.1
Viewer
•
Updated
•
82k
•
296
RLHFlow/ArmoRM-Multi-Objective-Data-v0.2
Viewer
•
Updated
•
555k
•
22
RLHFlow/ArmoRM-Multi-Objective-Data-v0.1
Viewer
•
Updated
•
569k
•
8
RLHFlow/pair_data_v2_80K_wsafety_short
Viewer
•
Updated
•
790k
•
302
RLHFlow/pair_data_v2_78_wo_safety
Viewer
•
Updated
•
777k
•
2
RLHFlow/pair_data_v2_80K_wsafety
Viewer
•
Updated
•
803k
•
1.37k
•
1
RLHFlow/preference_data_v2_80K_wsafety
Viewer
•
Updated
•
803k
•
277