Model Card for Kushtrim/norbert3-large-norsk-sentiment-sst2

Model Description

This model is a sentiment analysis tool specifically tailored for the Norwegian language. It leverages the BERT architecture, which is renowned for its effectiveness in understanding the context of a word in a sentence. The model is fine-tuned on LTG NorBERT 3 Large model to specifically enhance its performance on Norwegian texts. It's designed to classify sentiments as either positive or negative.

Intended Use

Primary Use: Sentiment analysis for Norwegian text. Target Audience: Data scientists, NLP practitioners, researchers, and businesses interested in understanding sentiment in Norwegian language texts. Application Examples: Analyzing customer feedback, social media monitoring, market research.

Training Data

The model is trained on the SST2 (Stanford Sentiment Treebank 2) dataset that has been machine-translated into Norwegian. The SST2 dataset is originally in English and comprises sentences from movie reviews, annotated for sentiment (positive/negative). This rich dataset provides a broad range of colloquial and formal language use, reflecting a wide array of sentiments. The machine translation process aimed to retain the sentiment and linguistic nuances of the original dataset while adapting it to the Norwegian linguistic context. However, potential translation inaccuracies may affect the model's understanding and classification of sentiments in certain cases.

Limitations

The model might not perform well on dialects or slang. Context understanding might be limited in complex sentences. Performance might degrade on texts from domains not represented in the training set.

Ethical Considerations

Care should be taken not to use the model to amplify biases present in the training data. The model should not be used for manipulative or harmful purposes, such as influencing political elections.

Instructions on how to implement and use the model

from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
import pandas as pd

tokenizer = AutoTokenizer.from_pretrained("Kushtrim/norbert3-large-norsk-sentiment-sst2", trust_remote_code=True)
model = AutoModelForSequenceClassification.from_pretrained("Kushtrim/norbert3-large-norsk-sentiment-sst2", trust_remote_code=True)

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

text = "Dette var en vakker film"

output = classifier(text)

print(output)

Kushtrim
/

norbert3-large-norsk-sentiment-sst2

You need to agree to share your contact information to access this model