ViDeBERTa: A powerful pre-trained language model for Vietnamese
ViDeBERTa, a new pre-trained monolingual language model for Vietnamese, with three versions - ViDeBERTa_xsmall, ViDeBERTa_base, and ViDeBERTa_large, which are pre-trained on 138GB of Vietnamese text of high-quality and diverse Vietnamese text using DeBERTaV3 architecture.
Please check the official repository for more implementation details and updates
The DeBERTa V3 xsmall model comes with 12 layers and a hidden size of 384. It has only 22M backbone parameters with a vocabulary containing 128K tokens which introduces 48M parameters in the Embedding layer. This model was trained using CC100 dataset, which consists of 138 GB of Vietnamese text.
Fine-tuning on NLU tasks
We present the dev results on VLSP POS, PhoNER, ViQuAD dataset.
Model | #Params(M) | POS | NER | MRC |
---|---|---|---|---|
XLM-R-base | 125M | 96.2 | - | 82.0 |
XLM-R-large | 355M | 96.3 | 93.8 | 87.0 |
PhoBERT-base | 135M | 96.7 | 80.1 | |
PhoBERT-large | 370M | 96.8 | 83.5 | |
ViT5-base | 310M | - | 94.5 | - |
ViT5-large | 866M | - | 93.8 | - |
ViDeBERTa-xsmall | 22M | 96.4 | 93.6 | 81.3 |
ViDeBERTa-base | 86M | 96.8 | 94.5 | 85.7 |
ViDeBERTa-large | 304M | 97.2 | 95.3 | 89.9 |
Citation
If you find ViDeBERTa useful for your work, please cite the following papers:
@article{dao2023videberta,
title={ViDeBERTa: A powerful pre-trained language model for Vietnamese},
author={Dao Tran, Cong and Pham, Nhut Huy and Nguyen, Anh and Son Hy, Truong and Vu, Tu},
journal={arXiv e-prints},
pages={arXiv--2301},
year={2023}
}