ltgoslo's picture
typo
ff0ec79
|
raw
history blame
No virus
3.83 kB
metadata
tags:
  - text2text-generation
  - definition-modeling
metrics:
  - rouge
model-index:
  - name: mt0-definition-en-xl
    results: []
language:
  - en
widget:
  - text: He ate a sweet apple. What is the definition of apple?
    example_title: Definition generation
  - text: >-
      The paper contains a number of original ideas about color perception. What
      is the definition of original?
    example_title: Definition generation
license: cc-by-sa-4.0
datasets:
  - marksverdhei/wordnet-definitions-en-2021

mt0-definition-en-xl

This model is a version of mt0-xl fine-tuned on English WordNet, CodWoE and Oxford.

It achieves the following results on the evaluation set:

  • Loss: 1.7210
  • Rouge1: 41.5067
  • Rouge2: 23.7149
  • Rougel: 39.138
  • Rougelsum: 39.1647
  • Gen Len: 15.1578

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
2.1171 1.0 1370 1.8175 27.0261 8.6429 25.2826 25.2952 11.8798
1.8186 2.0 2740 1.7112 29.1583 9.9747 27.3432 27.3647 11.7919
1.643 3.0 4110 1.6442 30.9045 11.2256 28.7826 28.788 12.4125
1.499 4.0 5480 1.5978 32.1126 12.6674 29.97 29.9843 12.3129
1.3772 5.0 6850 1.5720 33.6113 13.8451 31.3468 31.3599 12.6887
1.2742 6.0 8220 1.5564 34.4899 15.1005 32.3177 32.3291 12.2003
1.1785 7.0 9590 1.5466 35.4729 16.2035 33.2166 33.2295 12.4487
1.0941 8.0 10960 1.5571 36.4885 17.5396 34.2494 34.2759 12.7543
1.0202 9.0 12330 1.5541 37.4019 18.5568 35.1341 35.1473 12.8603
0.9552 10.0 13700 1.5642 38.127 19.4057 35.9008 35.9163 12.6987
0.8963 11.0 15070 1.5772 38.5073 20.0584 36.3304 36.3399 12.7052
0.8443 12.0 16440 1.5955 39.2323 20.9237 36.9863 37.0049 13.0395
0.7982 13.0 17810 1.6089 39.7947 21.6422 37.5619 37.5815 13.1400
0.7586 14.0 19180 1.6293 40.2922 22.2301 38.0755 38.0757 12.8589
0.7234 15.0 20550 1.6493 40.6358 22.5355 38.3523 38.3659 13.1102
0.6946 16.0 21920 1.6701 40.7708 22.906 38.5037 38.5174 13.1035
0.6688 17.0 23290 1.6902 41.0847 23.1663 38.8126 38.8149 13.2951
0.6484 18.0 24660 1.7005 41.2075 23.3967 38.9529 38.9545 13.2707
0.6342 19.0 26030 1.7116 41.2454 23.5187 39.0203 39.0396 13.2173
0.6234 20.0 27400 1.7210 41.3073 23.5691 39.0662 39.074 13.2558

Framework versions

  • Transformers 4.30.2
  • Pytorch 1.13.1+rocm5.2
  • Datasets 2.12.0
  • Tokenizers 0.12.1