metadata

license: cc-by-4.0
language:
  - en
pipeline_tag: text-classification
tags:
  - roberta-large
  - topic
  - news
widget:
  - text: >-
      Diplomatic efforts to deal with the world’s two wars — the civil war in
      Spain and the undeclared Chinese - Japanese conflict — received sharp
      setbacks today.
  - text: >-
      WASHINGTON. AP. A decisive development appeared in the offing in the
      tug-of-war between the federal government and the states over the
      financing of relief.
  - text: >-
      A frantic bride called the Rochester Gas and Electric corporation to
      complain that her new refrigerator “freezes ice cubes too fast.”

Fine-tuned RoBERTa-large for detecting news on politics

Model Description

This model is a finetuned RoBERTa-large, for classifying whether news articles are about politics.

How to Use

from transformers import pipeline
classifier = pipeline("text-classification", model="dell-research-harvard/topic-politics")
classifier("Kennedy wins election")

Training data

The model was trained on a hand-labelled sample of data from the NEWSWIRE dataset.

Split	Size
Train	2418
Dev	498
Test	1473

Test set results

Metric	Result
F1	0.8492
Accuracy	0.9593
Precision	0.9086
Recall	0.7972

Citation Information

You can cite this dataset using

@misc{silcock2024newswirelargescalestructureddatabase,
      title={Newswire: A Large-Scale Structured Database of a Century of Historical News}, 
      author={Emily Silcock and Abhishek Arora and Luca D'Amico-Wong and Melissa Dell},
      year={2024},
      eprint={2406.09490},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.09490}, 
}

Applications

We applied this model to a century of historical news articles. You can see all the classifications in the NEWSWIRE dataset.