Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

aLLMA-Large

Note: This model is not a fine-tuned version of BERT, we have simply used the same architecture.

Citation

If you use the dataset, please cite the following paper:

@inproceedings{isbarov-etal-2024-open,
    title = "Open foundation models for {A}zerbaijani language",
    author = "Isbarov, Jafar  and
      Huseynova, Kavsar  and
      Mammadov, Elvin  and
      Hajili, Mammad  and
      Ataman, Duygu",
    editor = {Ataman, Duygu  and
      Derin, Mehmet Oguz  and
      Ivanova, Sardana  and
      K{\"o}ksal, Abdullatif  and
      S{\"a}lev{\"a}, Jonne  and
      Zeyrek, Deniz},
    booktitle = "Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand and Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.sigturk-1.2",
    pages = "18--28",
    abstract = "The emergence of multilingual large language models has enabled the development of language understanding and generation systems in Azerbaijani. However, most of the production-grade systems rely on cloud solutions, such as GPT-4. While there have been several attempts to develop open foundation models for Azerbaijani, these works have not found their way into common use due to a lack of systemic benchmarking. This paper encompasses several lines of work that promote open-source foundation models for Azerbaijani. We introduce (1) a large text corpus for Azerbaijani, (2) a family of encoder-only language models trained on this dataset, (3) labeled datasets for evaluating these models, and (4) extensive evaluation that covers all major open-source models with Azerbaijani support.",
}

https:/arxiv.org/abs/2407.02337

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10000
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Framework versions

  • Transformers 4.42.0.dev0
  • Pytorch 2.3.1+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
26
Safetensors
Model size
369M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for allmalab/bert-large-aze

Finetuned
(103)
this model

Collection including allmalab/bert-large-aze