File size: 4,201 Bytes
a37a8cd
 
3102e25
 
110ad28
 
 
 
3102e25
 
 
 
 
110ad28
 
 
 
 
eb92bf4
 
 
 
 
 
 
3102e25
 
110ad28
 
 
 
 
 
 
 
 
 
eb92bf4
110ad28
 
eb92bf4
 
 
 
 
 
 
110ad28
a37a8cd
3102e25
 
 
8445047
eb92bf4
 
 
 
 
3102e25
110ad28
 
3102e25
110ad28
 
 
 
 
 
 
 
 
 
3102e25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eb92bf4
110ad28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3102e25
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
license: mit
base_model: roberta-base
tags:
  - topic
  - classification
  - news
  - roberta
metrics:
- accuracy
- f1
- precision
- recall
datasets:
  - dstefa/New_York_Times_Topics
widget:
  - text: >-
      Olympic champion Kostas Kederis today left hospital ahead of his date with IOC inquisitors claiming his innocence and vowing.
    example_title: Sports
  - text: >-
      Although many individuals are doing fever checks to screen for Covid-19, many Covid-19 patients never have a fever.
    example_title: Health and Wellness
  - text: >-
      Twelve myths about Russia's War in Ukraine exposed
    example_title: Crime
model-index:
- name: roberta-base_topic_classification_nyt_news
  results:
    - task:
          name: Text Classification
          type: text-classification
      dataset:
          name: New_York_Times_Topics
          type: News
      metrics:
          - type: F1
            name: F1
            value: 0.91
          - type: accuracy
            name: accuracy
            value: 0.91
          - type: precision
            name: precision
            value: 0.91
          - type: recall
            name: recall
            value: 0.91
pipeline_tag: text-classification
---

# roberta-base_topic_classification_nyt_news

This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on the NYT News dataset, which contains 256,000 news titles from articles published from 2000 to the present (https://www.kaggle.com/datasets/aryansingh0909/nyt-articles-21m-2000-present).
It achieves the following results on the test set of 51200 cases:
- Accuracy: 0.91
- F1: 0.91
- Precision: 0.91
- Recall: 0.91

## Training data
Training data was classified as follow:

class |Description
-|-
0 |Sports
1 |Arts, Culture, and Entertainment
2 |Business and Finance
3 |Health and Wellness
4 |Lifestyle and Fashion
5 |Science and Technology
6 |Politics
7 |Crime

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 5

### Training results

| Training Loss | Epoch | Step   | Validation Loss | Accuracy | F1     | Precision | Recall |
|:-------------:|:-----:|:------:|:---------------:|:--------:|:------:|:---------:|:------:|
| 0.3192        | 1.0   | 20480  | 0.4078          | 0.8865   | 0.8859 | 0.8892    | 0.8865 |
| 0.2863        | 2.0   | 40960  | 0.4271          | 0.8972   | 0.8970 | 0.8982    | 0.8972 |
| 0.1979        | 3.0   | 61440  | 0.3797          | 0.9094   | 0.9092 | 0.9098    | 0.9094 |
| 0.1239        | 4.0   | 81920  | 0.3981          | 0.9117   | 0.9113 | 0.9114    | 0.9117 |
| 0.1472        | 5.0   | 102400 | 0.4033          | 0.9137   | 0.9135 | 0.9134    | 0.9137 |

### Model performance

-|precision|recall|f1|support
-|-|-|-|-
Sports|0.97|0.98|0.97|6400
Arts, Culture, and Entertainment|0.94|0.95|0.94|6400
Business and Finance|0.85|0.84|0.84|6400
Health and Wellness|0.90|0.93|0.91|6400
Lifestyle and Fashion|0.95|0.95|0.95|6400
Science and Technology|0.89|0.83|0.86|6400
Politics|0.93|0.88|0.90|6400
Crime|0.85|0.93|0.89|6400
 | | | |
accuracy|||0.91|51200
macro avg|0.91|0.91|0.91|51200
weighted avg|0.91|0.91|0.91|51200

### How to use roberta-base_topic_classification_nyt_news with HuggingFace

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("dstefa/roberta-base_topic_classification_nyt_news")
model = AutoModelForSequenceClassification.from_pretrained("dstefa/roberta-base_topic_classification_nyt_news")
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

text = "Kederis proclaims innocence Olympic champion Kostas Kederis today left hospital ahead of his date with IOC inquisitors claiming his innocence and vowing."
pipe(text)

[{'label': 'Sports', 'score': 0.9989326596260071}]

```

### Framework versions

- Transformers 4.32.1
- Pytorch 2.1.0+cu121
- Datasets 2.12.0
- Tokenizers 0.13.2