munish0838 commited on
Commit
3e8d1a8
1 Parent(s): e991428

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3
3
+ pipeline_tag: text-generation
4
+ base_model: Salesforce/LLaMA-3-8B-SFR-SFT-R
5
+ ---
6
+ # LLaMA-3-8B-SFR-SFT-R-GGUF
7
+ This is quzntized version of [Salesforce/LLaMA-3-8B-SFR-SFT-R](https://huggingface.co/Salesforce/LLaMA-3-8B-SFR-SFT-R) created using llama.cpp
8
+
9
+
10
+ # Model Description
11
+ This is the SFT model for Salesforce/SFR-Iterative-DPO-LLaMA-3-8B-R.
12
+
13
+ ## Model Releases
14
+ - [SFT model](https://huggingface.co/Salesforce/LLaMA-3-8B-SFR-SFT-R)
15
+ - [Reward model](https://huggingface.co/Salesforce/LLaMA-3-8B-SFR-RM-R)
16
+ - [RLHF model](https://huggingface.co/Salesforce/LLaMA-3-8B-SFR-Iterative-DPO-R)
17
+
18
+
19
+ ## Original Model Citation
20
+ Please cite our techical report if you find our model is useful for your research or product.
21
+
22
+ ```bibtex
23
+ @misc{dong2024rlhf,
24
+ title={RLHF Workflow: From Reward Modeling to Online RLHF},
25
+ author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang},
26
+ year={2024},
27
+ eprint={2405.07863},
28
+ archivePrefix={arXiv},
29
+ primaryClass={cs.LG}
30
+ }
31
+ ```