File size: 2,456 Bytes
efe8d7c
 
 
 
 
 
 
 
 
 
 
a2cad04
efe8d7c
0ae4284
07840fc
0ae4284
 
78085c6
 
0ae4284
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76888d9
0ae4284
 
 
 
 
efe8d7c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
datasets:
- rotten_tomatoes
- sst2
- amazon_polarity
- imdb
- yelp_polarity
language:
- en
tags:
- sentiment
pipeline_tag: text-classification
---
# SentiCSE
This is a RoBERTa-base model trained on MR dataset and finetuned for sentiment analysis with the Sentiment tasks. 
This model is suitable for English.

+ Reference Paper: SentiCSE (Main of Coling 2024).
+ Git Repo: https://github.com/nayohan/SentiCSE.

```python
import torch
from scipy.spatial.distance import cosine
from transformers import AutoTokenizer, AutoModel


tokenizer = AutoTokenizer.from_pretrained("DILAB-HYU/SentiCSE")
model = AutoModel.from_pretrained("DILAB-HYU/SentiCSE")

# Tokenize input texts
texts = [
    "The food is delicious.",
    "The atmosphere of the restaurant is good.",
    "The food at the restaurant is devoid of flavor.",
    "The restaurant lacks a good ambiance."
]
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")

# Get the embeddings
with torch.no_grad():
    embeddings = model(**inputs, output_hidden_states=True, return_dict=True).pooler_output

# Calculate cosine similarities
# Cosine similarities are in [-1, 1]. Higher means more similar
cosine_sim_0_1 = 1 - cosine(embeddings[0], embeddings[1])
cosine_sim_0_2 = 1 - cosine(embeddings[0], embeddings[2])
cosine_sim_0_3 = 1 - cosine(embeddings[0], embeddings[3])

print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[1], cosine_sim_0_1))
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[2], cosine_sim_0_2))
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[3], cosine_sim_0_3))

```
Output:

```
Cosine similarity between "The food is delicious." and "The atmosphere of the restaurant is good." is: 0.942
Cosine similarity between "The food is delicious." and "The food at the restaurant is devoid of flavor." is: 0.703
Cosine similarity between "The food is delicious." and "The restaurant lacks a good ambiance." is: 0.656
```

## BibTeX entry and citation info
Please cite the reference paper if you use this model.

```
@article{2024SentiCSE,
  title={SentiCSE: A Sentiment-aware Contrastive Sentence Embedding Framework with Sentiment-guided Textual Similarity},
  author={Kim, Jaemin and Na, Yohan and Kim, Kangmin and Lee, Sangrak and Chae, Dong-Kyu},
  journal={Proceedings of the 30th International Conference on Computational Linguistics (COLING)},
  year={2024},
}
```