Sentence Similarity
PEFT
Safetensors
English
text-embedding
embeddings
information-retrieval
beir
text-classification
language-model
text-clustering
text-semantic-similarity
text-evaluation
text-reranking
feature-extraction
Sentence Similarity
natural_questions
ms_marco
fever
hotpot_qa
mteb
Eval Results
library_name: peft | |
license: mit | |
language: | |
- en | |
pipeline_tag: sentence-similarity | |
tags: | |
- text-embedding | |
- embeddings | |
- information-retrieval | |
- beir | |
- text-classification | |
- language-model | |
- text-clustering | |
- text-semantic-similarity | |
- text-evaluation | |
- text-reranking | |
- feature-extraction | |
- sentence-similarity | |
- Sentence Similarity | |
- natural_questions | |
- ms_marco | |
- fever | |
- hotpot_qa | |
- mteb | |
model-index: | |
- name: LLM2Vec-Sheared-LLaMA-supervised | |
results: | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_counterfactual | |
name: MTEB AmazonCounterfactualClassification (en) | |
config: en | |
split: test | |
revision: e8379541af4e31359cca9fbcf4b00f2671dba205 | |
metrics: | |
- type: accuracy | |
value: 77.41791044776119 | |
- type: ap | |
value: 41.45458580415683 | |
- type: f1 | |
value: 71.63305447032735 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_polarity | |
name: MTEB AmazonPolarityClassification | |
config: default | |
split: test | |
revision: e2d317d38cd51312af73b3d32a06d1a08b442046 | |
metrics: | |
- type: accuracy | |
value: 82.0527 | |
- type: ap | |
value: 77.3222852456055 | |
- type: f1 | |
value: 81.97981459031165 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_reviews_multi | |
name: MTEB AmazonReviewsClassification (en) | |
config: en | |
split: test | |
revision: 1399c76144fd37290681b995c656ef9b2e06e26d | |
metrics: | |
- type: accuracy | |
value: 40.806000000000004 | |
- type: f1 | |
value: 40.3299129176701 | |
- task: | |
type: Retrieval | |
dataset: | |
type: arguana | |
name: MTEB ArguAna | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 25.391000000000002 | |
- type: map_at_10 | |
value: 41.919000000000004 | |
- type: map_at_100 | |
value: 42.846000000000004 | |
- type: map_at_1000 | |
value: 42.851 | |
- type: map_at_3 | |
value: 36.260999999999996 | |
- type: map_at_5 | |
value: 39.528999999999996 | |
- type: mrr_at_1 | |
value: 26.245 | |
- type: mrr_at_10 | |
value: 42.215 | |
- type: mrr_at_100 | |
value: 43.135 | |
- type: mrr_at_1000 | |
value: 43.14 | |
- type: mrr_at_3 | |
value: 36.546 | |
- type: mrr_at_5 | |
value: 39.782000000000004 | |
- type: ndcg_at_1 | |
value: 25.391000000000002 | |
- type: ndcg_at_10 | |
value: 51.663000000000004 | |
- type: ndcg_at_100 | |
value: 55.419 | |
- type: ndcg_at_1000 | |
value: 55.517 | |
- type: ndcg_at_3 | |
value: 39.96 | |
- type: ndcg_at_5 | |
value: 45.909 | |
- type: precision_at_1 | |
value: 25.391000000000002 | |
- type: precision_at_10 | |
value: 8.3 | |
- type: precision_at_100 | |
value: 0.989 | |
- type: precision_at_1000 | |
value: 0.1 | |
- type: precision_at_3 | |
value: 16.904 | |
- type: precision_at_5 | |
value: 13.058 | |
- type: recall_at_1 | |
value: 25.391000000000002 | |
- type: recall_at_10 | |
value: 83.001 | |
- type: recall_at_100 | |
value: 98.933 | |
- type: recall_at_1000 | |
value: 99.644 | |
- type: recall_at_3 | |
value: 50.711 | |
- type: recall_at_5 | |
value: 65.292 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/arxiv-clustering-p2p | |
name: MTEB ArxivClusteringP2P | |
config: default | |
split: test | |
revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d | |
metrics: | |
- type: v_measure | |
value: 43.472186058302285 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/arxiv-clustering-s2s | |
name: MTEB ArxivClusteringS2S | |
config: default | |
split: test | |
revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 | |
metrics: | |
- type: v_measure | |
value: 39.846039374129546 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/askubuntudupquestions-reranking | |
name: MTEB AskUbuntuDupQuestions | |
config: default | |
split: test | |
revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 | |
metrics: | |
- type: map | |
value: 60.713811638804174 | |
- type: mrr | |
value: 73.38906476718111 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/biosses-sts | |
name: MTEB BIOSSES | |
config: default | |
split: test | |
revision: d3fb88f8f02e40887cd149695127462bbcf29b4a | |
metrics: | |
- type: cos_sim_spearman | |
value: 85.88328221005123 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/banking77 | |
name: MTEB Banking77Classification | |
config: default | |
split: test | |
revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 | |
metrics: | |
- type: accuracy | |
value: 86.00974025974025 | |
- type: f1 | |
value: 85.97349359388288 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/biorxiv-clustering-p2p | |
name: MTEB BiorxivClusteringP2P | |
config: default | |
split: test | |
revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 | |
metrics: | |
- type: v_measure | |
value: 37.102075665637685 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/biorxiv-clustering-s2s | |
name: MTEB BiorxivClusteringS2S | |
config: default | |
split: test | |
revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 | |
metrics: | |
- type: v_measure | |
value: 34.27583239919031 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/android | |
name: MTEB CQADupstackAndroidRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 33.043 | |
- type: map_at_10 | |
value: 44.515 | |
- type: map_at_100 | |
value: 45.967999999999996 | |
- type: map_at_1000 | |
value: 46.098 | |
- type: map_at_3 | |
value: 40.285 | |
- type: map_at_5 | |
value: 42.841 | |
- type: mrr_at_1 | |
value: 40.2 | |
- type: mrr_at_10 | |
value: 50.233000000000004 | |
- type: mrr_at_100 | |
value: 50.938 | |
- type: mrr_at_1000 | |
value: 50.978 | |
- type: mrr_at_3 | |
value: 47.353 | |
- type: mrr_at_5 | |
value: 49.034 | |
- type: ndcg_at_1 | |
value: 40.2 | |
- type: ndcg_at_10 | |
value: 51.096 | |
- type: ndcg_at_100 | |
value: 56.267999999999994 | |
- type: ndcg_at_1000 | |
value: 58.092999999999996 | |
- type: ndcg_at_3 | |
value: 45.09 | |
- type: ndcg_at_5 | |
value: 48.198 | |
- type: precision_at_1 | |
value: 40.2 | |
- type: precision_at_10 | |
value: 9.843 | |
- type: precision_at_100 | |
value: 1.546 | |
- type: precision_at_1000 | |
value: 0.20400000000000001 | |
- type: precision_at_3 | |
value: 21.507 | |
- type: precision_at_5 | |
value: 15.966 | |
- type: recall_at_1 | |
value: 33.043 | |
- type: recall_at_10 | |
value: 63.871 | |
- type: recall_at_100 | |
value: 85.527 | |
- type: recall_at_1000 | |
value: 96.936 | |
- type: recall_at_3 | |
value: 46.859 | |
- type: recall_at_5 | |
value: 55.116 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/english | |
name: MTEB CQADupstackEnglishRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 31.924000000000003 | |
- type: map_at_10 | |
value: 42.298 | |
- type: map_at_100 | |
value: 43.589 | |
- type: map_at_1000 | |
value: 43.724000000000004 | |
- type: map_at_3 | |
value: 39.739999999999995 | |
- type: map_at_5 | |
value: 41.131 | |
- type: mrr_at_1 | |
value: 40.064 | |
- type: mrr_at_10 | |
value: 48.4 | |
- type: mrr_at_100 | |
value: 49.07 | |
- type: mrr_at_1000 | |
value: 49.113 | |
- type: mrr_at_3 | |
value: 46.635 | |
- type: mrr_at_5 | |
value: 47.549 | |
- type: ndcg_at_1 | |
value: 40.064 | |
- type: ndcg_at_10 | |
value: 47.686 | |
- type: ndcg_at_100 | |
value: 52.054 | |
- type: ndcg_at_1000 | |
value: 54.151 | |
- type: ndcg_at_3 | |
value: 44.57 | |
- type: ndcg_at_5 | |
value: 45.727000000000004 | |
- type: precision_at_1 | |
value: 40.064 | |
- type: precision_at_10 | |
value: 8.770999999999999 | |
- type: precision_at_100 | |
value: 1.422 | |
- type: precision_at_1000 | |
value: 0.19 | |
- type: precision_at_3 | |
value: 21.741 | |
- type: precision_at_5 | |
value: 14.790000000000001 | |
- type: recall_at_1 | |
value: 31.924000000000003 | |
- type: recall_at_10 | |
value: 56.603 | |
- type: recall_at_100 | |
value: 74.82900000000001 | |
- type: recall_at_1000 | |
value: 88.176 | |
- type: recall_at_3 | |
value: 46.11 | |
- type: recall_at_5 | |
value: 50.273999999999994 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/gaming | |
name: MTEB CQADupstackGamingRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 40.721000000000004 | |
- type: map_at_10 | |
value: 53.053 | |
- type: map_at_100 | |
value: 54.103 | |
- type: map_at_1000 | |
value: 54.157999999999994 | |
- type: map_at_3 | |
value: 49.854 | |
- type: map_at_5 | |
value: 51.547 | |
- type: mrr_at_1 | |
value: 46.833999999999996 | |
- type: mrr_at_10 | |
value: 56.61000000000001 | |
- type: mrr_at_100 | |
value: 57.286 | |
- type: mrr_at_1000 | |
value: 57.312 | |
- type: mrr_at_3 | |
value: 54.17999999999999 | |
- type: mrr_at_5 | |
value: 55.503 | |
- type: ndcg_at_1 | |
value: 46.833999999999996 | |
- type: ndcg_at_10 | |
value: 58.928000000000004 | |
- type: ndcg_at_100 | |
value: 62.939 | |
- type: ndcg_at_1000 | |
value: 63.970000000000006 | |
- type: ndcg_at_3 | |
value: 53.599 | |
- type: ndcg_at_5 | |
value: 55.96600000000001 | |
- type: precision_at_1 | |
value: 46.833999999999996 | |
- type: precision_at_10 | |
value: 9.48 | |
- type: precision_at_100 | |
value: 1.2349999999999999 | |
- type: precision_at_1000 | |
value: 0.13699999999999998 | |
- type: precision_at_3 | |
value: 24.032999999999998 | |
- type: precision_at_5 | |
value: 16.213 | |
- type: recall_at_1 | |
value: 40.721000000000004 | |
- type: recall_at_10 | |
value: 72.653 | |
- type: recall_at_100 | |
value: 89.91900000000001 | |
- type: recall_at_1000 | |
value: 97.092 | |
- type: recall_at_3 | |
value: 58.135999999999996 | |
- type: recall_at_5 | |
value: 64.156 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/gis | |
name: MTEB CQADupstackGisRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 24.938 | |
- type: map_at_10 | |
value: 34.027 | |
- type: map_at_100 | |
value: 34.999 | |
- type: map_at_1000 | |
value: 35.083 | |
- type: map_at_3 | |
value: 31.154 | |
- type: map_at_5 | |
value: 32.767 | |
- type: mrr_at_1 | |
value: 27.006000000000004 | |
- type: mrr_at_10 | |
value: 36.192 | |
- type: mrr_at_100 | |
value: 36.989 | |
- type: mrr_at_1000 | |
value: 37.053999999999995 | |
- type: mrr_at_3 | |
value: 33.503 | |
- type: mrr_at_5 | |
value: 34.977000000000004 | |
- type: ndcg_at_1 | |
value: 27.006000000000004 | |
- type: ndcg_at_10 | |
value: 39.297 | |
- type: ndcg_at_100 | |
value: 44.078 | |
- type: ndcg_at_1000 | |
value: 46.162 | |
- type: ndcg_at_3 | |
value: 33.695 | |
- type: ndcg_at_5 | |
value: 36.401 | |
- type: precision_at_1 | |
value: 27.006000000000004 | |
- type: precision_at_10 | |
value: 6.181 | |
- type: precision_at_100 | |
value: 0.905 | |
- type: precision_at_1000 | |
value: 0.11199999999999999 | |
- type: precision_at_3 | |
value: 14.426 | |
- type: precision_at_5 | |
value: 10.215 | |
- type: recall_at_1 | |
value: 24.938 | |
- type: recall_at_10 | |
value: 53.433 | |
- type: recall_at_100 | |
value: 75.558 | |
- type: recall_at_1000 | |
value: 91.096 | |
- type: recall_at_3 | |
value: 38.421 | |
- type: recall_at_5 | |
value: 44.906 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/mathematica | |
name: MTEB CQADupstackMathematicaRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 15.565999999999999 | |
- type: map_at_10 | |
value: 23.419999999999998 | |
- type: map_at_100 | |
value: 24.678 | |
- type: map_at_1000 | |
value: 24.801000000000002 | |
- type: map_at_3 | |
value: 20.465 | |
- type: map_at_5 | |
value: 21.979000000000003 | |
- type: mrr_at_1 | |
value: 19.652 | |
- type: mrr_at_10 | |
value: 27.929 | |
- type: mrr_at_100 | |
value: 28.92 | |
- type: mrr_at_1000 | |
value: 28.991 | |
- type: mrr_at_3 | |
value: 25.249 | |
- type: mrr_at_5 | |
value: 26.66 | |
- type: ndcg_at_1 | |
value: 19.652 | |
- type: ndcg_at_10 | |
value: 28.869 | |
- type: ndcg_at_100 | |
value: 34.675 | |
- type: ndcg_at_1000 | |
value: 37.577 | |
- type: ndcg_at_3 | |
value: 23.535 | |
- type: ndcg_at_5 | |
value: 25.807999999999996 | |
- type: precision_at_1 | |
value: 19.652 | |
- type: precision_at_10 | |
value: 5.659 | |
- type: precision_at_100 | |
value: 0.979 | |
- type: precision_at_1000 | |
value: 0.13699999999999998 | |
- type: precision_at_3 | |
value: 11.401 | |
- type: precision_at_5 | |
value: 8.581999999999999 | |
- type: recall_at_1 | |
value: 15.565999999999999 | |
- type: recall_at_10 | |
value: 41.163 | |
- type: recall_at_100 | |
value: 66.405 | |
- type: recall_at_1000 | |
value: 87.071 | |
- type: recall_at_3 | |
value: 26.478 | |
- type: recall_at_5 | |
value: 32.217 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/physics | |
name: MTEB CQADupstackPhysicsRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 30.834 | |
- type: map_at_10 | |
value: 41.49 | |
- type: map_at_100 | |
value: 42.897999999999996 | |
- type: map_at_1000 | |
value: 43.004 | |
- type: map_at_3 | |
value: 38.151 | |
- type: map_at_5 | |
value: 40.157 | |
- type: mrr_at_1 | |
value: 38.306000000000004 | |
- type: mrr_at_10 | |
value: 47.371 | |
- type: mrr_at_100 | |
value: 48.265 | |
- type: mrr_at_1000 | |
value: 48.304 | |
- type: mrr_at_3 | |
value: 44.915 | |
- type: mrr_at_5 | |
value: 46.516999999999996 | |
- type: ndcg_at_1 | |
value: 38.306000000000004 | |
- type: ndcg_at_10 | |
value: 47.394999999999996 | |
- type: ndcg_at_100 | |
value: 53.086999999999996 | |
- type: ndcg_at_1000 | |
value: 54.94799999999999 | |
- type: ndcg_at_3 | |
value: 42.384 | |
- type: ndcg_at_5 | |
value: 45.055 | |
- type: precision_at_1 | |
value: 38.306000000000004 | |
- type: precision_at_10 | |
value: 8.624 | |
- type: precision_at_100 | |
value: 1.325 | |
- type: precision_at_1000 | |
value: 0.165 | |
- type: precision_at_3 | |
value: 20.18 | |
- type: precision_at_5 | |
value: 14.418000000000001 | |
- type: recall_at_1 | |
value: 30.834 | |
- type: recall_at_10 | |
value: 58.977000000000004 | |
- type: recall_at_100 | |
value: 82.78 | |
- type: recall_at_1000 | |
value: 94.825 | |
- type: recall_at_3 | |
value: 44.954 | |
- type: recall_at_5 | |
value: 51.925 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/programmers | |
name: MTEB CQADupstackProgrammersRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 28.549000000000003 | |
- type: map_at_10 | |
value: 38.796 | |
- type: map_at_100 | |
value: 40.085 | |
- type: map_at_1000 | |
value: 40.198 | |
- type: map_at_3 | |
value: 35.412 | |
- type: map_at_5 | |
value: 37.116 | |
- type: mrr_at_1 | |
value: 35.388 | |
- type: mrr_at_10 | |
value: 44.626 | |
- type: mrr_at_100 | |
value: 45.445 | |
- type: mrr_at_1000 | |
value: 45.491 | |
- type: mrr_at_3 | |
value: 41.952 | |
- type: mrr_at_5 | |
value: 43.368 | |
- type: ndcg_at_1 | |
value: 35.388 | |
- type: ndcg_at_10 | |
value: 44.894 | |
- type: ndcg_at_100 | |
value: 50.166999999999994 | |
- type: ndcg_at_1000 | |
value: 52.308 | |
- type: ndcg_at_3 | |
value: 39.478 | |
- type: ndcg_at_5 | |
value: 41.608000000000004 | |
- type: precision_at_1 | |
value: 35.388 | |
- type: precision_at_10 | |
value: 8.322000000000001 | |
- type: precision_at_100 | |
value: 1.2670000000000001 | |
- type: precision_at_1000 | |
value: 0.164 | |
- type: precision_at_3 | |
value: 18.836 | |
- type: precision_at_5 | |
value: 13.333 | |
- type: recall_at_1 | |
value: 28.549000000000003 | |
- type: recall_at_10 | |
value: 57.229 | |
- type: recall_at_100 | |
value: 79.541 | |
- type: recall_at_1000 | |
value: 93.887 | |
- type: recall_at_3 | |
value: 42.056 | |
- type: recall_at_5 | |
value: 47.705999999999996 | |
- task: | |
type: Retrieval | |
dataset: | |
type: mteb/cqadupstack | |
name: MTEB CQADupstackRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 26.897333333333336 | |
- type: map_at_10 | |
value: 36.28758333333334 | |
- type: map_at_100 | |
value: 37.480083333333326 | |
- type: map_at_1000 | |
value: 37.59683333333333 | |
- type: map_at_3 | |
value: 33.3485 | |
- type: map_at_5 | |
value: 34.98283333333334 | |
- type: mrr_at_1 | |
value: 31.98916666666667 | |
- type: mrr_at_10 | |
value: 40.61116666666666 | |
- type: mrr_at_100 | |
value: 41.42133333333333 | |
- type: mrr_at_1000 | |
value: 41.476333333333336 | |
- type: mrr_at_3 | |
value: 38.19366666666667 | |
- type: mrr_at_5 | |
value: 39.53125 | |
- type: ndcg_at_1 | |
value: 31.98916666666667 | |
- type: ndcg_at_10 | |
value: 41.73475 | |
- type: ndcg_at_100 | |
value: 46.72291666666666 | |
- type: ndcg_at_1000 | |
value: 48.94916666666666 | |
- type: ndcg_at_3 | |
value: 36.883833333333335 | |
- type: ndcg_at_5 | |
value: 39.114 | |
- type: precision_at_1 | |
value: 31.98916666666667 | |
- type: precision_at_10 | |
value: 7.364083333333335 | |
- type: precision_at_100 | |
value: 1.1604166666666667 | |
- type: precision_at_1000 | |
value: 0.15433333333333335 | |
- type: precision_at_3 | |
value: 17.067500000000003 | |
- type: precision_at_5 | |
value: 12.091916666666666 | |
- type: recall_at_1 | |
value: 26.897333333333336 | |
- type: recall_at_10 | |
value: 53.485749999999996 | |
- type: recall_at_100 | |
value: 75.38716666666666 | |
- type: recall_at_1000 | |
value: 90.75841666666666 | |
- type: recall_at_3 | |
value: 39.86725 | |
- type: recall_at_5 | |
value: 45.683416666666666 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/stats | |
name: MTEB CQADupstackStatsRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 23.544 | |
- type: map_at_10 | |
value: 30.85 | |
- type: map_at_100 | |
value: 31.674000000000003 | |
- type: map_at_1000 | |
value: 31.778000000000002 | |
- type: map_at_3 | |
value: 28.451999999999998 | |
- type: map_at_5 | |
value: 29.797 | |
- type: mrr_at_1 | |
value: 26.687 | |
- type: mrr_at_10 | |
value: 33.725 | |
- type: mrr_at_100 | |
value: 34.439 | |
- type: mrr_at_1000 | |
value: 34.512 | |
- type: mrr_at_3 | |
value: 31.493 | |
- type: mrr_at_5 | |
value: 32.735 | |
- type: ndcg_at_1 | |
value: 26.687 | |
- type: ndcg_at_10 | |
value: 35.207 | |
- type: ndcg_at_100 | |
value: 39.406 | |
- type: ndcg_at_1000 | |
value: 42.021 | |
- type: ndcg_at_3 | |
value: 30.842000000000002 | |
- type: ndcg_at_5 | |
value: 32.882 | |
- type: precision_at_1 | |
value: 26.687 | |
- type: precision_at_10 | |
value: 5.66 | |
- type: precision_at_100 | |
value: 0.836 | |
- type: precision_at_1000 | |
value: 0.11299999999999999 | |
- type: precision_at_3 | |
value: 13.395000000000001 | |
- type: precision_at_5 | |
value: 9.386999999999999 | |
- type: recall_at_1 | |
value: 23.544 | |
- type: recall_at_10 | |
value: 45.769 | |
- type: recall_at_100 | |
value: 65.33200000000001 | |
- type: recall_at_1000 | |
value: 84.82499999999999 | |
- type: recall_at_3 | |
value: 33.665 | |
- type: recall_at_5 | |
value: 38.795 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/tex | |
name: MTEB CQADupstackTexRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 16.524 | |
- type: map_at_10 | |
value: 23.65 | |
- type: map_at_100 | |
value: 24.654999999999998 | |
- type: map_at_1000 | |
value: 24.786 | |
- type: map_at_3 | |
value: 21.441 | |
- type: map_at_5 | |
value: 22.664 | |
- type: mrr_at_1 | |
value: 20.372 | |
- type: mrr_at_10 | |
value: 27.548000000000002 | |
- type: mrr_at_100 | |
value: 28.37 | |
- type: mrr_at_1000 | |
value: 28.449 | |
- type: mrr_at_3 | |
value: 25.291999999999998 | |
- type: mrr_at_5 | |
value: 26.596999999999998 | |
- type: ndcg_at_1 | |
value: 20.372 | |
- type: ndcg_at_10 | |
value: 28.194000000000003 | |
- type: ndcg_at_100 | |
value: 32.955 | |
- type: ndcg_at_1000 | |
value: 35.985 | |
- type: ndcg_at_3 | |
value: 24.212 | |
- type: ndcg_at_5 | |
value: 26.051000000000002 | |
- type: precision_at_1 | |
value: 20.372 | |
- type: precision_at_10 | |
value: 5.237 | |
- type: precision_at_100 | |
value: 0.8909999999999999 | |
- type: precision_at_1000 | |
value: 0.132 | |
- type: precision_at_3 | |
value: 11.643 | |
- type: precision_at_5 | |
value: 8.424 | |
- type: recall_at_1 | |
value: 16.524 | |
- type: recall_at_10 | |
value: 37.969 | |
- type: recall_at_100 | |
value: 59.48 | |
- type: recall_at_1000 | |
value: 81.04599999999999 | |
- type: recall_at_3 | |
value: 26.647 | |
- type: recall_at_5 | |
value: 31.558999999999997 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/unix | |
name: MTEB CQADupstackUnixRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 26.273000000000003 | |
- type: map_at_10 | |
value: 35.176 | |
- type: map_at_100 | |
value: 36.367 | |
- type: map_at_1000 | |
value: 36.473 | |
- type: map_at_3 | |
value: 32.583 | |
- type: map_at_5 | |
value: 33.977000000000004 | |
- type: mrr_at_1 | |
value: 30.97 | |
- type: mrr_at_10 | |
value: 39.31 | |
- type: mrr_at_100 | |
value: 40.225 | |
- type: mrr_at_1000 | |
value: 40.284 | |
- type: mrr_at_3 | |
value: 37.111 | |
- type: mrr_at_5 | |
value: 38.296 | |
- type: ndcg_at_1 | |
value: 30.97 | |
- type: ndcg_at_10 | |
value: 40.323 | |
- type: ndcg_at_100 | |
value: 45.725 | |
- type: ndcg_at_1000 | |
value: 48.022 | |
- type: ndcg_at_3 | |
value: 35.772 | |
- type: ndcg_at_5 | |
value: 37.741 | |
- type: precision_at_1 | |
value: 30.97 | |
- type: precision_at_10 | |
value: 6.819 | |
- type: precision_at_100 | |
value: 1.061 | |
- type: precision_at_1000 | |
value: 0.136 | |
- type: precision_at_3 | |
value: 16.387 | |
- type: precision_at_5 | |
value: 11.437 | |
- type: recall_at_1 | |
value: 26.273000000000003 | |
- type: recall_at_10 | |
value: 51.772 | |
- type: recall_at_100 | |
value: 75.362 | |
- type: recall_at_1000 | |
value: 91.232 | |
- type: recall_at_3 | |
value: 39.172000000000004 | |
- type: recall_at_5 | |
value: 44.147999999999996 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/webmasters | |
name: MTEB CQADupstackWebmastersRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 28.326 | |
- type: map_at_10 | |
value: 37.97 | |
- type: map_at_100 | |
value: 39.602 | |
- type: map_at_1000 | |
value: 39.812999999999995 | |
- type: map_at_3 | |
value: 34.838 | |
- type: map_at_5 | |
value: 36.582 | |
- type: mrr_at_1 | |
value: 33.992 | |
- type: mrr_at_10 | |
value: 42.875 | |
- type: mrr_at_100 | |
value: 43.78 | |
- type: mrr_at_1000 | |
value: 43.827 | |
- type: mrr_at_3 | |
value: 40.481 | |
- type: mrr_at_5 | |
value: 41.657 | |
- type: ndcg_at_1 | |
value: 33.992 | |
- type: ndcg_at_10 | |
value: 44.122 | |
- type: ndcg_at_100 | |
value: 49.652 | |
- type: ndcg_at_1000 | |
value: 51.919000000000004 | |
- type: ndcg_at_3 | |
value: 39.285 | |
- type: ndcg_at_5 | |
value: 41.449999999999996 | |
- type: precision_at_1 | |
value: 33.992 | |
- type: precision_at_10 | |
value: 8.32 | |
- type: precision_at_100 | |
value: 1.617 | |
- type: precision_at_1000 | |
value: 0.245 | |
- type: precision_at_3 | |
value: 18.445 | |
- type: precision_at_5 | |
value: 13.281 | |
- type: recall_at_1 | |
value: 28.326 | |
- type: recall_at_10 | |
value: 55.822 | |
- type: recall_at_100 | |
value: 80.352 | |
- type: recall_at_1000 | |
value: 94.441 | |
- type: recall_at_3 | |
value: 41.704 | |
- type: recall_at_5 | |
value: 47.513 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/wordpress | |
name: MTEB CQADupstackWordpressRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 22.526 | |
- type: map_at_10 | |
value: 30.206 | |
- type: map_at_100 | |
value: 31.142999999999997 | |
- type: map_at_1000 | |
value: 31.246000000000002 | |
- type: map_at_3 | |
value: 27.807 | |
- type: map_at_5 | |
value: 29.236 | |
- type: mrr_at_1 | |
value: 24.399 | |
- type: mrr_at_10 | |
value: 32.515 | |
- type: mrr_at_100 | |
value: 33.329 | |
- type: mrr_at_1000 | |
value: 33.400999999999996 | |
- type: mrr_at_3 | |
value: 30.159999999999997 | |
- type: mrr_at_5 | |
value: 31.482 | |
- type: ndcg_at_1 | |
value: 24.399 | |
- type: ndcg_at_10 | |
value: 34.806 | |
- type: ndcg_at_100 | |
value: 39.669 | |
- type: ndcg_at_1000 | |
value: 42.234 | |
- type: ndcg_at_3 | |
value: 30.144 | |
- type: ndcg_at_5 | |
value: 32.481 | |
- type: precision_at_1 | |
value: 24.399 | |
- type: precision_at_10 | |
value: 5.453 | |
- type: precision_at_100 | |
value: 0.8410000000000001 | |
- type: precision_at_1000 | |
value: 0.117 | |
- type: precision_at_3 | |
value: 12.815999999999999 | |
- type: precision_at_5 | |
value: 9.057 | |
- type: recall_at_1 | |
value: 22.526 | |
- type: recall_at_10 | |
value: 46.568 | |
- type: recall_at_100 | |
value: 69.56099999999999 | |
- type: recall_at_1000 | |
value: 88.474 | |
- type: recall_at_3 | |
value: 34.205000000000005 | |
- type: recall_at_5 | |
value: 39.885999999999996 | |
- task: | |
type: Retrieval | |
dataset: | |
type: climate-fever | |
name: MTEB ClimateFEVER | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 14.363000000000001 | |
- type: map_at_10 | |
value: 24.101 | |
- type: map_at_100 | |
value: 26.240000000000002 | |
- type: map_at_1000 | |
value: 26.427 | |
- type: map_at_3 | |
value: 20.125 | |
- type: map_at_5 | |
value: 22.128 | |
- type: mrr_at_1 | |
value: 32.182 | |
- type: mrr_at_10 | |
value: 44.711 | |
- type: mrr_at_100 | |
value: 45.523 | |
- type: mrr_at_1000 | |
value: 45.551 | |
- type: mrr_at_3 | |
value: 41.443999999999996 | |
- type: mrr_at_5 | |
value: 43.473 | |
- type: ndcg_at_1 | |
value: 32.182 | |
- type: ndcg_at_10 | |
value: 33.495000000000005 | |
- type: ndcg_at_100 | |
value: 41.192 | |
- type: ndcg_at_1000 | |
value: 44.346000000000004 | |
- type: ndcg_at_3 | |
value: 27.651999999999997 | |
- type: ndcg_at_5 | |
value: 29.634 | |
- type: precision_at_1 | |
value: 32.182 | |
- type: precision_at_10 | |
value: 10.391 | |
- type: precision_at_100 | |
value: 1.8679999999999999 | |
- type: precision_at_1000 | |
value: 0.246 | |
- type: precision_at_3 | |
value: 20.586 | |
- type: precision_at_5 | |
value: 15.648000000000001 | |
- type: recall_at_1 | |
value: 14.363000000000001 | |
- type: recall_at_10 | |
value: 39.706 | |
- type: recall_at_100 | |
value: 65.763 | |
- type: recall_at_1000 | |
value: 83.296 | |
- type: recall_at_3 | |
value: 25.064999999999998 | |
- type: recall_at_5 | |
value: 31.085 | |
- task: | |
type: Retrieval | |
dataset: | |
type: dbpedia-entity | |
name: MTEB DBPedia | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 8.698 | |
- type: map_at_10 | |
value: 20.237 | |
- type: map_at_100 | |
value: 28.534 | |
- type: map_at_1000 | |
value: 30.346 | |
- type: map_at_3 | |
value: 14.097999999999999 | |
- type: map_at_5 | |
value: 16.567999999999998 | |
- type: mrr_at_1 | |
value: 68.0 | |
- type: mrr_at_10 | |
value: 76.35 | |
- type: mrr_at_100 | |
value: 76.676 | |
- type: mrr_at_1000 | |
value: 76.68 | |
- type: mrr_at_3 | |
value: 74.792 | |
- type: mrr_at_5 | |
value: 75.717 | |
- type: ndcg_at_1 | |
value: 56.25 | |
- type: ndcg_at_10 | |
value: 43.578 | |
- type: ndcg_at_100 | |
value: 47.928 | |
- type: ndcg_at_1000 | |
value: 55.312 | |
- type: ndcg_at_3 | |
value: 47.744 | |
- type: ndcg_at_5 | |
value: 45.257 | |
- type: precision_at_1 | |
value: 68.0 | |
- type: precision_at_10 | |
value: 35.275 | |
- type: precision_at_100 | |
value: 10.985 | |
- type: precision_at_1000 | |
value: 2.235 | |
- type: precision_at_3 | |
value: 52.0 | |
- type: precision_at_5 | |
value: 44.45 | |
- type: recall_at_1 | |
value: 8.698 | |
- type: recall_at_10 | |
value: 26.661 | |
- type: recall_at_100 | |
value: 54.686 | |
- type: recall_at_1000 | |
value: 77.795 | |
- type: recall_at_3 | |
value: 15.536 | |
- type: recall_at_5 | |
value: 19.578 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/emotion | |
name: MTEB EmotionClassification | |
config: default | |
split: test | |
revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 | |
metrics: | |
- type: accuracy | |
value: 48.385000000000005 | |
- type: f1 | |
value: 43.818784352804165 | |
- task: | |
type: Retrieval | |
dataset: | |
type: fever | |
name: MTEB FEVER | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 75.399 | |
- type: map_at_10 | |
value: 83.02199999999999 | |
- type: map_at_100 | |
value: 83.204 | |
- type: map_at_1000 | |
value: 83.217 | |
- type: map_at_3 | |
value: 81.86 | |
- type: map_at_5 | |
value: 82.677 | |
- type: mrr_at_1 | |
value: 81.233 | |
- type: mrr_at_10 | |
value: 88.10900000000001 | |
- type: mrr_at_100 | |
value: 88.17099999999999 | |
- type: mrr_at_1000 | |
value: 88.172 | |
- type: mrr_at_3 | |
value: 87.289 | |
- type: mrr_at_5 | |
value: 87.897 | |
- type: ndcg_at_1 | |
value: 81.233 | |
- type: ndcg_at_10 | |
value: 86.80600000000001 | |
- type: ndcg_at_100 | |
value: 87.492 | |
- type: ndcg_at_1000 | |
value: 87.71600000000001 | |
- type: ndcg_at_3 | |
value: 84.975 | |
- type: ndcg_at_5 | |
value: 86.158 | |
- type: precision_at_1 | |
value: 81.233 | |
- type: precision_at_10 | |
value: 10.299999999999999 | |
- type: precision_at_100 | |
value: 1.085 | |
- type: precision_at_1000 | |
value: 0.11199999999999999 | |
- type: precision_at_3 | |
value: 32.178000000000004 | |
- type: precision_at_5 | |
value: 20.069 | |
- type: recall_at_1 | |
value: 75.399 | |
- type: recall_at_10 | |
value: 93.533 | |
- type: recall_at_100 | |
value: 96.32300000000001 | |
- type: recall_at_1000 | |
value: 97.695 | |
- type: recall_at_3 | |
value: 88.61099999999999 | |
- type: recall_at_5 | |
value: 91.617 | |
- task: | |
type: Retrieval | |
dataset: | |
type: fiqa | |
name: MTEB FiQA2018 | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 20.564 | |
- type: map_at_10 | |
value: 33.162000000000006 | |
- type: map_at_100 | |
value: 35.146 | |
- type: map_at_1000 | |
value: 35.32 | |
- type: map_at_3 | |
value: 28.786 | |
- type: map_at_5 | |
value: 31.22 | |
- type: mrr_at_1 | |
value: 40.278000000000006 | |
- type: mrr_at_10 | |
value: 48.577 | |
- type: mrr_at_100 | |
value: 49.385 | |
- type: mrr_at_1000 | |
value: 49.423 | |
- type: mrr_at_3 | |
value: 46.116 | |
- type: mrr_at_5 | |
value: 47.305 | |
- type: ndcg_at_1 | |
value: 40.278000000000006 | |
- type: ndcg_at_10 | |
value: 40.998000000000005 | |
- type: ndcg_at_100 | |
value: 48.329 | |
- type: ndcg_at_1000 | |
value: 51.148 | |
- type: ndcg_at_3 | |
value: 36.852000000000004 | |
- type: ndcg_at_5 | |
value: 38.146 | |
- type: precision_at_1 | |
value: 40.278000000000006 | |
- type: precision_at_10 | |
value: 11.466 | |
- type: precision_at_100 | |
value: 1.9120000000000001 | |
- type: precision_at_1000 | |
value: 0.242 | |
- type: precision_at_3 | |
value: 24.383 | |
- type: precision_at_5 | |
value: 18.179000000000002 | |
- type: recall_at_1 | |
value: 20.564 | |
- type: recall_at_10 | |
value: 48.327999999999996 | |
- type: recall_at_100 | |
value: 75.89 | |
- type: recall_at_1000 | |
value: 92.826 | |
- type: recall_at_3 | |
value: 33.517 | |
- type: recall_at_5 | |
value: 39.46 | |
- task: | |
type: Retrieval | |
dataset: | |
type: hotpotqa | |
name: MTEB HotpotQA | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 34.294000000000004 | |
- type: map_at_10 | |
value: 55.435 | |
- type: map_at_100 | |
value: 56.507 | |
- type: map_at_1000 | |
value: 56.57600000000001 | |
- type: map_at_3 | |
value: 51.654999999999994 | |
- type: map_at_5 | |
value: 54.086 | |
- type: mrr_at_1 | |
value: 68.589 | |
- type: mrr_at_10 | |
value: 75.837 | |
- type: mrr_at_100 | |
value: 76.142 | |
- type: mrr_at_1000 | |
value: 76.155 | |
- type: mrr_at_3 | |
value: 74.50099999999999 | |
- type: mrr_at_5 | |
value: 75.339 | |
- type: ndcg_at_1 | |
value: 68.589 | |
- type: ndcg_at_10 | |
value: 63.846000000000004 | |
- type: ndcg_at_100 | |
value: 67.65 | |
- type: ndcg_at_1000 | |
value: 69.015 | |
- type: ndcg_at_3 | |
value: 58.355999999999995 | |
- type: ndcg_at_5 | |
value: 61.489000000000004 | |
- type: precision_at_1 | |
value: 68.589 | |
- type: precision_at_10 | |
value: 13.738 | |
- type: precision_at_100 | |
value: 1.67 | |
- type: precision_at_1000 | |
value: 0.185 | |
- type: precision_at_3 | |
value: 37.736 | |
- type: precision_at_5 | |
value: 25.11 | |
- type: recall_at_1 | |
value: 34.294000000000004 | |
- type: recall_at_10 | |
value: 68.69 | |
- type: recall_at_100 | |
value: 83.477 | |
- type: recall_at_1000 | |
value: 92.465 | |
- type: recall_at_3 | |
value: 56.604 | |
- type: recall_at_5 | |
value: 62.775000000000006 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/imdb | |
name: MTEB ImdbClassification | |
config: default | |
split: test | |
revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 | |
metrics: | |
- type: accuracy | |
value: 75.332 | |
- type: ap | |
value: 69.58548013224627 | |
- type: f1 | |
value: 75.19505914957745 | |
- task: | |
type: Retrieval | |
dataset: | |
type: msmarco | |
name: MTEB MSMARCO | |
config: default | |
split: dev | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 19.373 | |
- type: map_at_10 | |
value: 31.377 | |
- type: map_at_100 | |
value: 32.635 | |
- type: map_at_1000 | |
value: 32.688 | |
- type: map_at_3 | |
value: 27.337 | |
- type: map_at_5 | |
value: 29.608 | |
- type: mrr_at_1 | |
value: 19.900000000000002 | |
- type: mrr_at_10 | |
value: 31.928 | |
- type: mrr_at_100 | |
value: 33.14 | |
- type: mrr_at_1000 | |
value: 33.184999999999995 | |
- type: mrr_at_3 | |
value: 27.955999999999996 | |
- type: mrr_at_5 | |
value: 30.209999999999997 | |
- type: ndcg_at_1 | |
value: 19.900000000000002 | |
- type: ndcg_at_10 | |
value: 38.324000000000005 | |
- type: ndcg_at_100 | |
value: 44.45 | |
- type: ndcg_at_1000 | |
value: 45.728 | |
- type: ndcg_at_3 | |
value: 30.099999999999998 | |
- type: ndcg_at_5 | |
value: 34.157 | |
- type: precision_at_1 | |
value: 19.900000000000002 | |
- type: precision_at_10 | |
value: 6.246 | |
- type: precision_at_100 | |
value: 0.932 | |
- type: precision_at_1000 | |
value: 0.104 | |
- type: precision_at_3 | |
value: 12.937000000000001 | |
- type: precision_at_5 | |
value: 9.817 | |
- type: recall_at_1 | |
value: 19.373 | |
- type: recall_at_10 | |
value: 59.82300000000001 | |
- type: recall_at_100 | |
value: 88.252 | |
- type: recall_at_1000 | |
value: 97.962 | |
- type: recall_at_3 | |
value: 37.480999999999995 | |
- type: recall_at_5 | |
value: 47.215 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/mtop_domain | |
name: MTEB MTOPDomainClassification (en) | |
config: en | |
split: test | |
revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf | |
metrics: | |
- type: accuracy | |
value: 94.08800729594162 | |
- type: f1 | |
value: 93.6743110282188 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/mtop_intent | |
name: MTEB MTOPIntentClassification (en) | |
config: en | |
split: test | |
revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba | |
metrics: | |
- type: accuracy | |
value: 77.04742362061104 | |
- type: f1 | |
value: 59.62885599991211 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_massive_intent | |
name: MTEB MassiveIntentClassification (en) | |
config: en | |
split: test | |
revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 | |
metrics: | |
- type: accuracy | |
value: 75.58170813718897 | |
- type: f1 | |
value: 73.57458347240402 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_massive_scenario | |
name: MTEB MassiveScenarioClassification (en) | |
config: en | |
split: test | |
revision: 7d571f92784cd94a019292a1f45445077d0ef634 | |
metrics: | |
- type: accuracy | |
value: 79.15601882985877 | |
- type: f1 | |
value: 79.08126473478004 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/medrxiv-clustering-p2p | |
name: MTEB MedrxivClusteringP2P | |
config: default | |
split: test | |
revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 | |
metrics: | |
- type: v_measure | |
value: 33.551020623875196 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/medrxiv-clustering-s2s | |
name: MTEB MedrxivClusteringS2S | |
config: default | |
split: test | |
revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 | |
metrics: | |
- type: v_measure | |
value: 31.110159113704523 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/mind_small | |
name: MTEB MindSmallReranking | |
config: default | |
split: test | |
revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 | |
metrics: | |
- type: map | |
value: 31.960982592404424 | |
- type: mrr | |
value: 33.106781262600435 | |
- task: | |
type: Retrieval | |
dataset: | |
type: nfcorpus | |
name: MTEB NFCorpus | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 5.679 | |
- type: map_at_10 | |
value: 13.922 | |
- type: map_at_100 | |
value: 17.949 | |
- type: map_at_1000 | |
value: 19.573999999999998 | |
- type: map_at_3 | |
value: 10.061 | |
- type: map_at_5 | |
value: 11.931 | |
- type: mrr_at_1 | |
value: 47.678 | |
- type: mrr_at_10 | |
value: 56.701 | |
- type: mrr_at_100 | |
value: 57.221 | |
- type: mrr_at_1000 | |
value: 57.260999999999996 | |
- type: mrr_at_3 | |
value: 54.334 | |
- type: mrr_at_5 | |
value: 55.85099999999999 | |
- type: ndcg_at_1 | |
value: 45.975 | |
- type: ndcg_at_10 | |
value: 37.117 | |
- type: ndcg_at_100 | |
value: 34.633 | |
- type: ndcg_at_1000 | |
value: 43.498 | |
- type: ndcg_at_3 | |
value: 42.475 | |
- type: ndcg_at_5 | |
value: 40.438 | |
- type: precision_at_1 | |
value: 47.678 | |
- type: precision_at_10 | |
value: 27.647 | |
- type: precision_at_100 | |
value: 9.08 | |
- type: precision_at_1000 | |
value: 2.218 | |
- type: precision_at_3 | |
value: 39.938 | |
- type: precision_at_5 | |
value: 35.17 | |
- type: recall_at_1 | |
value: 5.679 | |
- type: recall_at_10 | |
value: 18.552 | |
- type: recall_at_100 | |
value: 35.799 | |
- type: recall_at_1000 | |
value: 68.029 | |
- type: recall_at_3 | |
value: 11.43 | |
- type: recall_at_5 | |
value: 14.71 | |
- task: | |
type: Retrieval | |
dataset: | |
type: nq | |
name: MTEB NQ | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 29.055999999999997 | |
- type: map_at_10 | |
value: 45.547 | |
- type: map_at_100 | |
value: 46.591 | |
- type: map_at_1000 | |
value: 46.615 | |
- type: map_at_3 | |
value: 40.81 | |
- type: map_at_5 | |
value: 43.673 | |
- type: mrr_at_1 | |
value: 32.763999999999996 | |
- type: mrr_at_10 | |
value: 47.937999999999995 | |
- type: mrr_at_100 | |
value: 48.691 | |
- type: mrr_at_1000 | |
value: 48.705 | |
- type: mrr_at_3 | |
value: 43.984 | |
- type: mrr_at_5 | |
value: 46.467999999999996 | |
- type: ndcg_at_1 | |
value: 32.763999999999996 | |
- type: ndcg_at_10 | |
value: 53.891999999999996 | |
- type: ndcg_at_100 | |
value: 58.167 | |
- type: ndcg_at_1000 | |
value: 58.67099999999999 | |
- type: ndcg_at_3 | |
value: 45.007999999999996 | |
- type: ndcg_at_5 | |
value: 49.805 | |
- type: precision_at_1 | |
value: 32.763999999999996 | |
- type: precision_at_10 | |
value: 9.186 | |
- type: precision_at_100 | |
value: 1.1560000000000001 | |
- type: precision_at_1000 | |
value: 0.12 | |
- type: precision_at_3 | |
value: 21.012 | |
- type: precision_at_5 | |
value: 15.348 | |
- type: recall_at_1 | |
value: 29.055999999999997 | |
- type: recall_at_10 | |
value: 76.864 | |
- type: recall_at_100 | |
value: 95.254 | |
- type: recall_at_1000 | |
value: 98.914 | |
- type: recall_at_3 | |
value: 53.911 | |
- type: recall_at_5 | |
value: 64.982 | |
- task: | |
type: Retrieval | |
dataset: | |
type: quora | |
name: MTEB QuoraRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 69.393 | |
- type: map_at_10 | |
value: 83.408 | |
- type: map_at_100 | |
value: 84.071 | |
- type: map_at_1000 | |
value: 84.086 | |
- type: map_at_3 | |
value: 80.372 | |
- type: map_at_5 | |
value: 82.245 | |
- type: mrr_at_1 | |
value: 80.06 | |
- type: mrr_at_10 | |
value: 86.546 | |
- type: mrr_at_100 | |
value: 86.661 | |
- type: mrr_at_1000 | |
value: 86.66199999999999 | |
- type: mrr_at_3 | |
value: 85.56700000000001 | |
- type: mrr_at_5 | |
value: 86.215 | |
- type: ndcg_at_1 | |
value: 80.07 | |
- type: ndcg_at_10 | |
value: 87.372 | |
- type: ndcg_at_100 | |
value: 88.683 | |
- type: ndcg_at_1000 | |
value: 88.78 | |
- type: ndcg_at_3 | |
value: 84.384 | |
- type: ndcg_at_5 | |
value: 85.978 | |
- type: precision_at_1 | |
value: 80.07 | |
- type: precision_at_10 | |
value: 13.345 | |
- type: precision_at_100 | |
value: 1.5350000000000001 | |
- type: precision_at_1000 | |
value: 0.157 | |
- type: precision_at_3 | |
value: 36.973 | |
- type: precision_at_5 | |
value: 24.334 | |
- type: recall_at_1 | |
value: 69.393 | |
- type: recall_at_10 | |
value: 94.994 | |
- type: recall_at_100 | |
value: 99.523 | |
- type: recall_at_1000 | |
value: 99.97399999999999 | |
- type: recall_at_3 | |
value: 86.459 | |
- type: recall_at_5 | |
value: 90.962 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/reddit-clustering | |
name: MTEB RedditClustering | |
config: default | |
split: test | |
revision: 24640382cdbf8abc73003fb0fa6d111a705499eb | |
metrics: | |
- type: v_measure | |
value: 53.02365304347829 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/reddit-clustering-p2p | |
name: MTEB RedditClusteringP2P | |
config: default | |
split: test | |
revision: 282350215ef01743dc01b456c7f5241fa8937f16 | |
metrics: | |
- type: v_measure | |
value: 60.4722130918676 | |
- task: | |
type: Retrieval | |
dataset: | |
type: scidocs | |
name: MTEB SCIDOCS | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 4.233 | |
- type: map_at_10 | |
value: 10.333 | |
- type: map_at_100 | |
value: 12.286 | |
- type: map_at_1000 | |
value: 12.594 | |
- type: map_at_3 | |
value: 7.514 | |
- type: map_at_5 | |
value: 8.774 | |
- type: mrr_at_1 | |
value: 20.9 | |
- type: mrr_at_10 | |
value: 31.232 | |
- type: mrr_at_100 | |
value: 32.287 | |
- type: mrr_at_1000 | |
value: 32.352 | |
- type: mrr_at_3 | |
value: 27.766999999999996 | |
- type: mrr_at_5 | |
value: 29.487000000000002 | |
- type: ndcg_at_1 | |
value: 20.9 | |
- type: ndcg_at_10 | |
value: 17.957 | |
- type: ndcg_at_100 | |
value: 25.526 | |
- type: ndcg_at_1000 | |
value: 31.097 | |
- type: ndcg_at_3 | |
value: 16.915 | |
- type: ndcg_at_5 | |
value: 14.579 | |
- type: precision_at_1 | |
value: 20.9 | |
- type: precision_at_10 | |
value: 9.41 | |
- type: precision_at_100 | |
value: 2.032 | |
- type: precision_at_1000 | |
value: 0.337 | |
- type: precision_at_3 | |
value: 15.767000000000001 | |
- type: precision_at_5 | |
value: 12.659999999999998 | |
- type: recall_at_1 | |
value: 4.233 | |
- type: recall_at_10 | |
value: 19.067999999999998 | |
- type: recall_at_100 | |
value: 41.257 | |
- type: recall_at_1000 | |
value: 68.487 | |
- type: recall_at_3 | |
value: 9.618 | |
- type: recall_at_5 | |
value: 12.853 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sickr-sts | |
name: MTEB SICK-R | |
config: default | |
split: test | |
revision: a6ea5a8cab320b040a23452cc28066d9beae2cee | |
metrics: | |
- type: cos_sim_spearman | |
value: 82.25303886615637 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts12-sts | |
name: MTEB STS12 | |
config: default | |
split: test | |
revision: a0d554a64d88156834ff5ae9920b964011b16384 | |
metrics: | |
- type: cos_sim_spearman | |
value: 78.27678362978094 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts13-sts | |
name: MTEB STS13 | |
config: default | |
split: test | |
revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca | |
metrics: | |
- type: cos_sim_spearman | |
value: 85.5228883863618 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts14-sts | |
name: MTEB STS14 | |
config: default | |
split: test | |
revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 | |
metrics: | |
- type: cos_sim_spearman | |
value: 82.48847836687274 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts15-sts | |
name: MTEB STS15 | |
config: default | |
split: test | |
revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 | |
metrics: | |
- type: cos_sim_spearman | |
value: 88.76235312662311 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts16-sts | |
name: MTEB STS16 | |
config: default | |
split: test | |
revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 | |
metrics: | |
- type: cos_sim_spearman | |
value: 87.10893533398001 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts17-crosslingual-sts | |
name: MTEB STS17 (en-en) | |
config: en-en | |
split: test | |
revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d | |
metrics: | |
- type: cos_sim_spearman | |
value: 90.10224405448504 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts22-crosslingual-sts | |
name: MTEB STS22 (en) | |
config: en | |
split: test | |
revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 | |
metrics: | |
- type: cos_sim_spearman | |
value: 68.25088774601221 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/stsbenchmark-sts | |
name: MTEB STSBenchmark | |
config: default | |
split: test | |
revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 | |
metrics: | |
- type: cos_sim_spearman | |
value: 87.15751321128134 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/scidocs-reranking | |
name: MTEB SciDocsRR | |
config: default | |
split: test | |
revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab | |
metrics: | |
- type: map | |
value: 79.23418699664575 | |
- type: mrr | |
value: 93.72032288698955 | |
- task: | |
type: Retrieval | |
dataset: | |
type: scifact | |
name: MTEB SciFact | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 56.511 | |
- type: map_at_10 | |
value: 67.062 | |
- type: map_at_100 | |
value: 67.537 | |
- type: map_at_1000 | |
value: 67.553 | |
- type: map_at_3 | |
value: 63.375 | |
- type: map_at_5 | |
value: 65.828 | |
- type: mrr_at_1 | |
value: 59.333000000000006 | |
- type: mrr_at_10 | |
value: 67.95 | |
- type: mrr_at_100 | |
value: 68.284 | |
- type: mrr_at_1000 | |
value: 68.30000000000001 | |
- type: mrr_at_3 | |
value: 65.0 | |
- type: mrr_at_5 | |
value: 66.93299999999999 | |
- type: ndcg_at_1 | |
value: 59.333000000000006 | |
- type: ndcg_at_10 | |
value: 72.08099999999999 | |
- type: ndcg_at_100 | |
value: 74.232 | |
- type: ndcg_at_1000 | |
value: 74.657 | |
- type: ndcg_at_3 | |
value: 65.72200000000001 | |
- type: ndcg_at_5 | |
value: 69.395 | |
- type: precision_at_1 | |
value: 59.333000000000006 | |
- type: precision_at_10 | |
value: 9.8 | |
- type: precision_at_100 | |
value: 1.097 | |
- type: precision_at_1000 | |
value: 0.11299999999999999 | |
- type: precision_at_3 | |
value: 25.444 | |
- type: precision_at_5 | |
value: 17.533 | |
- type: recall_at_1 | |
value: 56.511 | |
- type: recall_at_10 | |
value: 86.63300000000001 | |
- type: recall_at_100 | |
value: 96.667 | |
- type: recall_at_1000 | |
value: 100.0 | |
- type: recall_at_3 | |
value: 70.217 | |
- type: recall_at_5 | |
value: 78.806 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/sprintduplicatequestions-pairclassification | |
name: MTEB SprintDuplicateQuestions | |
config: default | |
split: test | |
revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 | |
metrics: | |
- type: cos_sim_accuracy | |
value: 99.83861386138614 | |
- type: cos_sim_ap | |
value: 96.24728474711715 | |
- type: cos_sim_f1 | |
value: 91.76351692774129 | |
- type: cos_sim_precision | |
value: 92.74770173646579 | |
- type: cos_sim_recall | |
value: 90.8 | |
- type: dot_accuracy | |
value: 99.62475247524752 | |
- type: dot_ap | |
value: 88.12302791709324 | |
- type: dot_f1 | |
value: 81.0187409899087 | |
- type: dot_precision | |
value: 77.98334875115633 | |
- type: dot_recall | |
value: 84.3 | |
- type: euclidean_accuracy | |
value: 99.83465346534653 | |
- type: euclidean_ap | |
value: 95.79574410387337 | |
- type: euclidean_f1 | |
value: 91.56139464375947 | |
- type: euclidean_precision | |
value: 92.54341164453524 | |
- type: euclidean_recall | |
value: 90.60000000000001 | |
- type: manhattan_accuracy | |
value: 99.84059405940594 | |
- type: manhattan_ap | |
value: 95.81230332276807 | |
- type: manhattan_f1 | |
value: 91.80661577608143 | |
- type: manhattan_precision | |
value: 93.47150259067357 | |
- type: manhattan_recall | |
value: 90.2 | |
- type: max_accuracy | |
value: 99.84059405940594 | |
- type: max_ap | |
value: 96.24728474711715 | |
- type: max_f1 | |
value: 91.80661577608143 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/stackexchange-clustering | |
name: MTEB StackExchangeClustering | |
config: default | |
split: test | |
revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 | |
metrics: | |
- type: v_measure | |
value: 63.035694955649866 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/stackexchange-clustering-p2p | |
name: MTEB StackExchangeClusteringP2P | |
config: default | |
split: test | |
revision: 815ca46b2622cec33ccafc3735d572c266efdb44 | |
metrics: | |
- type: v_measure | |
value: 34.00935398440242 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/stackoverflowdupquestions-reranking | |
name: MTEB StackOverflowDupQuestions | |
config: default | |
split: test | |
revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 | |
metrics: | |
- type: map | |
value: 49.61138657342161 | |
- type: mrr | |
value: 50.26590749936338 | |
- task: | |
type: Summarization | |
dataset: | |
type: mteb/summeval | |
name: MTEB SummEval | |
config: default | |
split: test | |
revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c | |
metrics: | |
- type: cos_sim_pearson | |
value: 30.994071916424655 | |
- type: cos_sim_spearman | |
value: 30.010135460886296 | |
- type: dot_pearson | |
value: 27.03290596322524 | |
- type: dot_spearman | |
value: 28.824264579690357 | |
- task: | |
type: Retrieval | |
dataset: | |
type: trec-covid | |
name: MTEB TRECCOVID | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 0.247 | |
- type: map_at_10 | |
value: 2.01 | |
- type: map_at_100 | |
value: 12.912 | |
- type: map_at_1000 | |
value: 32.35 | |
- type: map_at_3 | |
value: 0.6859999999999999 | |
- type: map_at_5 | |
value: 1.089 | |
- type: mrr_at_1 | |
value: 92.0 | |
- type: mrr_at_10 | |
value: 95.25 | |
- type: mrr_at_100 | |
value: 95.25 | |
- type: mrr_at_1000 | |
value: 95.25 | |
- type: mrr_at_3 | |
value: 95.0 | |
- type: mrr_at_5 | |
value: 95.0 | |
- type: ndcg_at_1 | |
value: 88.0 | |
- type: ndcg_at_10 | |
value: 80.411 | |
- type: ndcg_at_100 | |
value: 63.871 | |
- type: ndcg_at_1000 | |
value: 58.145 | |
- type: ndcg_at_3 | |
value: 84.75399999999999 | |
- type: ndcg_at_5 | |
value: 82.372 | |
- type: precision_at_1 | |
value: 92.0 | |
- type: precision_at_10 | |
value: 84.8 | |
- type: precision_at_100 | |
value: 65.84 | |
- type: precision_at_1000 | |
value: 25.874000000000002 | |
- type: precision_at_3 | |
value: 90.0 | |
- type: precision_at_5 | |
value: 88.0 | |
- type: recall_at_1 | |
value: 0.247 | |
- type: recall_at_10 | |
value: 2.185 | |
- type: recall_at_100 | |
value: 16.051000000000002 | |
- type: recall_at_1000 | |
value: 55.18300000000001 | |
- type: recall_at_3 | |
value: 0.701 | |
- type: recall_at_5 | |
value: 1.1360000000000001 | |
- task: | |
type: Retrieval | |
dataset: | |
type: webis-touche2020 | |
name: MTEB Touche2020 | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 2.094 | |
- type: map_at_10 | |
value: 9.078 | |
- type: map_at_100 | |
value: 15.152 | |
- type: map_at_1000 | |
value: 16.773 | |
- type: map_at_3 | |
value: 4.67 | |
- type: map_at_5 | |
value: 6.111 | |
- type: mrr_at_1 | |
value: 24.490000000000002 | |
- type: mrr_at_10 | |
value: 39.989000000000004 | |
- type: mrr_at_100 | |
value: 41.248000000000005 | |
- type: mrr_at_1000 | |
value: 41.248000000000005 | |
- type: mrr_at_3 | |
value: 37.075 | |
- type: mrr_at_5 | |
value: 38.503 | |
- type: ndcg_at_1 | |
value: 21.429000000000002 | |
- type: ndcg_at_10 | |
value: 22.312 | |
- type: ndcg_at_100 | |
value: 35.077999999999996 | |
- type: ndcg_at_1000 | |
value: 46.903 | |
- type: ndcg_at_3 | |
value: 24.241 | |
- type: ndcg_at_5 | |
value: 21.884 | |
- type: precision_at_1 | |
value: 24.490000000000002 | |
- type: precision_at_10 | |
value: 20.816000000000003 | |
- type: precision_at_100 | |
value: 7.673000000000001 | |
- type: precision_at_1000 | |
value: 1.569 | |
- type: precision_at_3 | |
value: 27.211000000000002 | |
- type: precision_at_5 | |
value: 22.857 | |
- type: recall_at_1 | |
value: 2.094 | |
- type: recall_at_10 | |
value: 15.546 | |
- type: recall_at_100 | |
value: 47.764 | |
- type: recall_at_1000 | |
value: 84.461 | |
- type: recall_at_3 | |
value: 5.994 | |
- type: recall_at_5 | |
value: 8.967 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/toxic_conversations_50k | |
name: MTEB ToxicConversationsClassification | |
config: default | |
split: test | |
revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c | |
metrics: | |
- type: accuracy | |
value: 69.92240000000001 | |
- type: ap | |
value: 14.16088899225379 | |
- type: f1 | |
value: 54.04609416028299 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/tweet_sentiment_extraction | |
name: MTEB TweetSentimentExtractionClassification | |
config: default | |
split: test | |
revision: d604517c81ca91fe16a244d1248fc021f9ecee7a | |
metrics: | |
- type: accuracy | |
value: 60.764006791171475 | |
- type: f1 | |
value: 61.06042158638947 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/twentynewsgroups-clustering | |
name: MTEB TwentyNewsgroupsClustering | |
config: default | |
split: test | |
revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 | |
metrics: | |
- type: v_measure | |
value: 49.37015403955057 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/twittersemeval2015-pairclassification | |
name: MTEB TwitterSemEval2015 | |
config: default | |
split: test | |
revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 | |
metrics: | |
- type: cos_sim_accuracy | |
value: 86.8510460749836 | |
- type: cos_sim_ap | |
value: 76.13675917697662 | |
- type: cos_sim_f1 | |
value: 69.72121212121213 | |
- type: cos_sim_precision | |
value: 64.48430493273543 | |
- type: cos_sim_recall | |
value: 75.8839050131926 | |
- type: dot_accuracy | |
value: 82.2793109614353 | |
- type: dot_ap | |
value: 61.68231214221829 | |
- type: dot_f1 | |
value: 59.873802290254716 | |
- type: dot_precision | |
value: 53.73322147651006 | |
- type: dot_recall | |
value: 67.59894459102902 | |
- type: euclidean_accuracy | |
value: 86.78548012159504 | |
- type: euclidean_ap | |
value: 75.72625794456354 | |
- type: euclidean_f1 | |
value: 70.13506753376687 | |
- type: euclidean_precision | |
value: 66.66666666666666 | |
- type: euclidean_recall | |
value: 73.98416886543535 | |
- type: manhattan_accuracy | |
value: 86.78548012159504 | |
- type: manhattan_ap | |
value: 75.68264053123454 | |
- type: manhattan_f1 | |
value: 70.11952191235059 | |
- type: manhattan_precision | |
value: 66.38378123526638 | |
- type: manhattan_recall | |
value: 74.30079155672823 | |
- type: max_accuracy | |
value: 86.8510460749836 | |
- type: max_ap | |
value: 76.13675917697662 | |
- type: max_f1 | |
value: 70.13506753376687 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/twitterurlcorpus-pairclassification | |
name: MTEB TwitterURLCorpus | |
config: default | |
split: test | |
revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf | |
metrics: | |
- type: cos_sim_accuracy | |
value: 89.20712539294446 | |
- type: cos_sim_ap | |
value: 86.227146559573 | |
- type: cos_sim_f1 | |
value: 78.8050795036932 | |
- type: cos_sim_precision | |
value: 74.7085201793722 | |
- type: cos_sim_recall | |
value: 83.37696335078533 | |
- type: dot_accuracy | |
value: 86.59525749990297 | |
- type: dot_ap | |
value: 79.7714972191685 | |
- type: dot_f1 | |
value: 73.45451896105789 | |
- type: dot_precision | |
value: 69.70891239715135 | |
- type: dot_recall | |
value: 77.62550046196489 | |
- type: euclidean_accuracy | |
value: 88.92575775216362 | |
- type: euclidean_ap | |
value: 85.58942167175054 | |
- type: euclidean_f1 | |
value: 78.03423522915516 | |
- type: euclidean_precision | |
value: 74.76193835084996 | |
- type: euclidean_recall | |
value: 81.60609793655682 | |
- type: manhattan_accuracy | |
value: 88.92769821865176 | |
- type: manhattan_ap | |
value: 85.58316068024254 | |
- type: manhattan_f1 | |
value: 78.03337843933242 | |
- type: manhattan_precision | |
value: 76.23384253819037 | |
- type: manhattan_recall | |
value: 79.91992608561749 | |
- type: max_accuracy | |
value: 89.20712539294446 | |
- type: max_ap | |
value: 86.227146559573 | |
- type: max_f1 | |
value: 78.8050795036932 | |
# LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders | |
> LLM2Vec is a simple recipe to convert decoder-only LLMs into text encoders. It consists of 3 simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. The model can be further fine-tuned to achieve state-of-the-art performance. | |
- **Repository:** https://github.com/McGill-NLP/llm2vec | |
- **Paper:** https://arxiv.org/abs/2404.05961 | |
## Installation | |
```bash | |
pip install llm2vec | |
``` | |
## Usage | |
```python | |
from llm2vec import LLM2Vec | |
import torch | |
from transformers import AutoTokenizer, AutoModel, AutoConfig | |
from peft import PeftModel | |
# Loading base Mistral model, along with custom code that enables bidirectional connections in decoder-only LLMs. MNTP LoRA weights are merged into the base model. | |
tokenizer = AutoTokenizer.from_pretrained( | |
"McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp" | |
) | |
config = AutoConfig.from_pretrained( | |
"McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp", trust_remote_code=True | |
) | |
model = AutoModel.from_pretrained( | |
"McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp", | |
trust_remote_code=True, | |
config=config, | |
torch_dtype=torch.bfloat16, | |
device_map="cuda" if torch.cuda.is_available() else "cpu", | |
) | |
model = PeftModel.from_pretrained( | |
model, | |
"McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp", | |
) | |
model = model.merge_and_unload() # This can take several minutes on cpu | |
# Loading supervised model. This loads the trained LoRA weights on top of MNTP model. Hence the final weights are -- Base model + MNTP (LoRA) + supervised (LoRA). | |
model = PeftModel.from_pretrained( | |
model, "McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp-supervised" | |
) | |
# Wrapper for encoding and pooling operations | |
l2v = LLM2Vec(model, tokenizer, pooling_mode="mean", max_length=512) | |
# Encoding queries using instructions | |
instruction = ( | |
"Given a web search query, retrieve relevant passages that answer the query:" | |
) | |
queries = [ | |
[instruction, "how much protein should a female eat"], | |
[instruction, "summit define"], | |
] | |
q_reps = l2v.encode(queries) | |
# Encoding documents. Instruction are not required for documents | |
documents = [ | |
"As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.", | |
"Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments.", | |
] | |
d_reps = l2v.encode(documents) | |
# Compute cosine similarity | |
q_reps_norm = torch.nn.functional.normalize(q_reps, p=2, dim=1) | |
d_reps_norm = torch.nn.functional.normalize(d_reps, p=2, dim=1) | |
cos_sim = torch.mm(q_reps_norm, d_reps_norm.transpose(0, 1)) | |
print(cos_sim) | |
""" | |
tensor([[0.6500, 0.1291], | |
[0.0916, 0.4733]]) | |
""" | |
``` | |
## Questions | |
If you have any question about the code, feel free to email Parishad (`[email protected]`) and Vaibhav (`[email protected]`). |