--- tags: - text2text-generation - definition-modeling metrics: - rouge model-index: - name: mt0-definition-en-xl results: [] language: - en widget: - text: "He ate a sweet apple. What is the definition of apple?" example_title: "Definition generation" - text: "The paper contains a number of original ideas about color perception. What is the definition of original?" example_title: "Definition generation" license: cc-by-sa-4.0 datasets: - marksverdhei/wordnet-definitions-en-2021 --- # mt0-definition-en-xl This model is a version of [mt0-xl](https://huggingface.co/bigscience/mt0-xl) fine-tuned on English WordNet, CodWoE and Oxford. It achieves the following results on the evaluation set: - Loss: 1.7210 - Rouge1: 41.5067 - Rouge2: 23.7149 - Rougel: 39.138 - Rougelsum: 39.1647 - Gen Len: 15.1578 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 4 - total_train_batch_size: 128 - total_eval_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 20.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len | |:-------------:|:-----:|:-----:|:---------------:|:-------:|:-------:|:-------:|:---------:|:-------:| | 2.1171 | 1.0 | 1370 | 1.8175 | 27.0261 | 8.6429 | 25.2826 | 25.2952 | 11.8798 | | 1.8186 | 2.0 | 2740 | 1.7112 | 29.1583 | 9.9747 | 27.3432 | 27.3647 | 11.7919 | | 1.643 | 3.0 | 4110 | 1.6442 | 30.9045 | 11.2256 | 28.7826 | 28.788 | 12.4125 | | 1.499 | 4.0 | 5480 | 1.5978 | 32.1126 | 12.6674 | 29.97 | 29.9843 | 12.3129 | | 1.3772 | 5.0 | 6850 | 1.5720 | 33.6113 | 13.8451 | 31.3468 | 31.3599 | 12.6887 | | 1.2742 | 6.0 | 8220 | 1.5564 | 34.4899 | 15.1005 | 32.3177 | 32.3291 | 12.2003 | | 1.1785 | 7.0 | 9590 | 1.5466 | 35.4729 | 16.2035 | 33.2166 | 33.2295 | 12.4487 | | 1.0941 | 8.0 | 10960 | 1.5571 | 36.4885 | 17.5396 | 34.2494 | 34.2759 | 12.7543 | | 1.0202 | 9.0 | 12330 | 1.5541 | 37.4019 | 18.5568 | 35.1341 | 35.1473 | 12.8603 | | 0.9552 | 10.0 | 13700 | 1.5642 | 38.127 | 19.4057 | 35.9008 | 35.9163 | 12.6987 | | 0.8963 | 11.0 | 15070 | 1.5772 | 38.5073 | 20.0584 | 36.3304 | 36.3399 | 12.7052 | | 0.8443 | 12.0 | 16440 | 1.5955 | 39.2323 | 20.9237 | 36.9863 | 37.0049 | 13.0395 | | 0.7982 | 13.0 | 17810 | 1.6089 | 39.7947 | 21.6422 | 37.5619 | 37.5815 | 13.1400 | | 0.7586 | 14.0 | 19180 | 1.6293 | 40.2922 | 22.2301 | 38.0755 | 38.0757 | 12.8589 | | 0.7234 | 15.0 | 20550 | 1.6493 | 40.6358 | 22.5355 | 38.3523 | 38.3659 | 13.1102 | | 0.6946 | 16.0 | 21920 | 1.6701 | 40.7708 | 22.906 | 38.5037 | 38.5174 | 13.1035 | | 0.6688 | 17.0 | 23290 | 1.6902 | 41.0847 | 23.1663 | 38.8126 | 38.8149 | 13.2951 | | 0.6484 | 18.0 | 24660 | 1.7005 | 41.2075 | 23.3967 | 38.9529 | 38.9545 | 13.2707 | | 0.6342 | 19.0 | 26030 | 1.7116 | 41.2454 | 23.5187 | 39.0203 | 39.0396 | 13.2173 | | 0.6234 | 20.0 | 27400 | 1.7210 | 41.3073 | 23.5691 | 39.0662 | 39.074 | 13.2558 | ### Framework versions - Transformers 4.30.2 - Pytorch 1.13.1+rocm5.2 - Datasets 2.12.0 - Tokenizers 0.12.1