Model Card for Kushtrim/norbert3-large-norsk-sentiment-sst2
Model Description
This model is a sentiment analysis tool specifically tailored for the Norwegian language. It leverages the BERT architecture, which is renowned for its effectiveness in understanding the context of a word in a sentence. The model is fine-tuned on LTG NorBERT 3 Large model to specifically enhance its performance on Norwegian texts. It's designed to classify sentiments as either positive or negative.
Intended Use
Primary Use: Sentiment analysis for Norwegian text. Target Audience: Data scientists, NLP practitioners, researchers, and businesses interested in understanding sentiment in Norwegian language texts. Application Examples: Analyzing customer feedback, social media monitoring, market research.
Training Data
The model is trained on the SST2 (Stanford Sentiment Treebank 2) dataset that has been machine-translated into Norwegian. The SST2 dataset is originally in English and comprises sentences from movie reviews, annotated for sentiment (positive/negative). This rich dataset provides a broad range of colloquial and formal language use, reflecting a wide array of sentiments. The machine translation process aimed to retain the sentiment and linguistic nuances of the original dataset while adapting it to the Norwegian linguistic context. However, potential translation inaccuracies may affect the model's understanding and classification of sentiments in certain cases.
Limitations
The model might not perform well on dialects or slang. Context understanding might be limited in complex sentences. Performance might degrade on texts from domains not represented in the training set.
Ethical Considerations
Care should be taken not to use the model to amplify biases present in the training data. The model should not be used for manipulative or harmful purposes, such as influencing political elections.
Instructions on how to implement and use the model
from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
import pandas as pd
tokenizer = AutoTokenizer.from_pretrained("Kushtrim/norbert3-large-norsk-sentiment-sst2", trust_remote_code=True)
model = AutoModelForSequenceClassification.from_pretrained("Kushtrim/norbert3-large-norsk-sentiment-sst2", trust_remote_code=True)
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
text = "Dette var en vakker film"
output = classifier(text)
print(output)
- Downloads last month
- 26