Text2Text Generation
Transformers
Safetensors
Sinhala
mbart
Inference Endpoints
Edit model card

Sinhala Headline Generation

This is a text generation task created with the NSINA dataset. This dataset is also released with the same license as NSINA. The objective of the task is to generate news headlines based on the provided news content.

Data

We used the same instances from NSINA 1.0 as all the news articles had headlines. We divided this dataset into a training and test set following a 0.8 split. Data can be loaded into pandas dataframes using the following code.

from datasets import Dataset
from datasets import load_dataset

train = Dataset.to_pandas(load_dataset('sinhala-nlp/NSINA-Headlines', split='train'))
test = Dataset.to_pandas(load_dataset('sinhala-nlp/NSINA-Headlines', split='test'))

Citation

If you are using the dataset or the models, please cite the following paper.

@inproceedings{Nsina2024,
author={Hettiarachchi, Hansi and Premasiri, Damith and Uyangodage, Lasitha and Ranasinghe, Tharindu},
title={{NSINA: A News Corpus for Sinhala}},
booktitle={The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
year={2024},
month={May},
}
Downloads last month
4
Safetensors
Model size
611M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train sinhala-nlp/NSINA-Headlines-mbart-large-cc25

Collection including sinhala-nlp/NSINA-Headlines-mbart-large-cc25