s-nlp
/

ruRoberta-large-paraphrase-v1

Text Classification

sentence-similarity

Inference Endpoints

Model card Files Files and versions Community

cointegrated commited on Nov 6, 2022

Commit

2fa46f3

•

1 Parent(s): 4f6879b

Update README.md

Files changed (1) hide show

README.md +24 -1

README.md CHANGED Viewed

@@ -46,4 +46,27 @@ set | ROC AUC
 detox         | 0.857112
 paraphraser   | 0.858465
 rupaws_qqp    | 0.859195
-rupaws_wiki   | 0.906121

 detox         | 0.857112
 paraphraser   | 0.858465
 rupaws_qqp    | 0.859195
+rupaws_wiki   | 0.906121
+Example usage:
+```Python
+import torch
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+model = AutoModelForSequenceClassification.from_pretrained('SkolkovoInstitute/ruRoberta-large-paraphrase-v1')
+tokenizer = AutoTokenizer.from_pretrained('SkolkovoInstitute/ruRoberta-large-paraphrase-v1')
+def get_similarity(text1, text2):
+    """ Predict the probability that two Russian sentences are paraphrases of each other. """
+    with torch.inference_mode():
+        batch = tokenizer(
+            text1, text2,
+            truncation=True, max_length=model.config.max_position_embeddings, return_tensors='pt',
+        ).to(model.device)
+        proba = torch.softmax(model(**batch).logits, -1)
+    return proba[0][1].item()
+print(get_similarity('Я тебя люблю', 'Ты мне нравишься'))  # 0.9798
+print(get_similarity('Я тебя люблю', 'Я тебя ненавижу'))   # 0.0008
+```