Edit model card

PairRanker used in llm-blender, trained on deberta-v3-large. This is the ranker model used in experiments in LLM-Blender paper, which is trained on mixinstruct dataset for 5 epochs.

Statistics

Context length

PairRanker type Source max length Candidate max length Total max length
pair-ranker (This model) 128 128 384
pair-reward-model 1224 412 2048

MixInstrut Performance

Methods BERTScore BARTScore BLEURT GPT-Rank Beat Vic(%) Beat OA(%) Top-1(%) Top-2(%) Top-3(%)
Open Assistant 74.68 -3.45 -0.39 3.90 62.78 N/A 17.35 35.67 51.98
Vicuna 69.60 -3.44 -0.61 4.13 N/A 64.77 25.47 41.23 52.88
Alpaca 71.46 -3.57 -0.53 4.62 56.70 61.35 15.41 29.81 44.46
Baize 65.57 -3.53 -0.66 4.86 52.76 56.40 14.23 26.91 38.80
moss 64.85 -3.65 -0.73 5.09 51.62 51.79 15.93 27.52 38.27
ChatGLM 70.38 -3.52 -0.62 5.63 44.04 45.67 9.41 19.37 28.78
Koala 63.96 -3.85 -0.84 6.76 39.93 39.01 8.15 15.72 22.55
Dolly v2 62.26 -3.83 -0.87 6.90 33.33 31.44 5.16 10.06 16.45
Mosaic MPT 63.21 -3.72 -0.82 7.19 30.87 30.16 5.39 10.61 16.24
StableLM 62.47 -4.12 -0.98 8.71 21.55 19.87 2.33 4.74 7.96
Flan-T5 64.92 -4.57 -1.23 8.81 23.89 19.93 1.30 2.87 5.32
Oracle(BERTScore) 77.67 -3.17 -0.27 3.88 54.41 38.84 20.16 38.11 53.49
Oracle(BLEURT) 75.02 -3.15 -0.15 3.77 55.61 45.80 21.48 39.84 55.36
Oracle(BARTScore) 73.23 -2.87 -0.38 3.69 50.32 57.01 26.10 43.70 57.33
Oracle(ChatGPT) 70.32 -3.33 -0.51 1.00 100.00 100.00 100.00 100.00 100.00
Random 66.36 -3.76 -0.77 6.14 37.75 36.91 11.28 20.69 29.05
MLM-Scoring 64.77 -4.03 -0.88 7.00 33.87 30.39 7.29 14.09 21.46
SimCLS 73.14 -3.22 -0.38 3.50 52.11 49.93 26.72 46.24 60.72
SummaReranker 71.60 -3.25 -0.41 3.66 55.63 48.46 23.89 42.44 57.54
PairRanker 72.97 -3.14 -0.37 3.20 54.76 57.79 30.08 50.68 65.12

Usage Example

Since PairRanker contains some custom layers and tokens. We recommend use our pairranker with our llm-blender python repo. Otherwise, loading it directly with hugging face from_pretrained() API will encounter errors.

  • First install llm-blender
pip install git+https://github.com/yuchenlin/LLM-Blender.git
  • Then use pairranker with the following code:
import llm_blender
# ranker config
ranker_config = llm_blender.RankerConfig()
ranker_config.ranker_type = "pairranker" # only supports pairranker now.
ranker_config.model_type = "deberta"
ranker_config.model_name = "microsoft/deberta-v3-large" # ranker backbone
ranker_config.load_checkpoint = "llm-blender/pair-ranker" # hugging face hub model path or your local ranker checkpoint <your checkpoint path>
ranker_config.cache_dir = "./hf_models" # hugging face model cache dir
ranker_config.source_maxlength = 128
ranker_config.candidate_maxlength = 128
ranker_config.n_tasks = 1 # number of singal that has been used to train the ranker. This checkpoint is trained using BARTScore only, thus being 1.
fuser_config = llm_blender.GenFuserConfig()
# ignore fuser config as we don't use it here. You can load it if you want
blender_config = llm_blender.BlenderConfig()
# blender config
blender_config.device = "cuda" # blender ranker and fuser device
blender = llm_blender.Blender(blender_config, ranker_config, fuser_config)
  • Then you can rank candidates with the following function
inputs = ["input1", "input2"]
candidates_texts = [["candidate1 for input1", "candidatefor input1"], ["candidate1 for input2", "candidate2 for input2"]]
ranks = blender.rank(inputs, candidates_texts, return_scores=False, batch_size=2)
# ranks is a list of ranks where ranks[i][j] represents the ranks of candidate-j for input-i
  • Using pairranker to directly compare two candidates
candidates_A = [cands[0] for cands in candidates]
candidates_B = [cands[1] for cands in candidates]
comparison_results = blender.compare(inputs, candidates_A, candidates_B)
# comparison_results is a list of bool, where element[i] denotes whether candidates_A[i] is better than candidates_B[i] for inputs[i]

See LLM-Blender Github README.md and jupyter file blender_usage.ipynb for detailed usage examples.

Downloads last month
8
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train llm-blender/pair-ranker