Edit model card

Vit-GPT2-COCO2017Flickr-40k-05

This model is a fine-tuned version of nlpconnect/vit-gpt2-image-captioning on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5528
  • Rouge1: 44.1624
  • Rouge2: 19.6736
  • Rougel: 40.3898
  • Rougelsum: 40.4029
  • Gen Len: 12.263

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
0.1497 0.1 500 0.5462 40.1774 14.6199 36.3335 36.3518 12.5965
0.1604 0.2 1000 0.5302 41.4714 16.0237 37.5992 37.5915 11.914
0.1631 0.3 1500 0.5436 40.3816 14.6958 36.6109 36.6027 12.3295
0.1634 0.4 2000 0.5266 40.9484 15.9068 37.5194 37.5088 12.033
0.1576 0.5 2500 0.5544 40.373 15.012 36.5218 36.5141 12.3345
0.1599 0.6 3000 0.5425 40.7552 15.2754 37.1059 37.1299 12.191
0.291 0.7 3500 0.4545 41.5934 16.251 37.7291 37.7113 12.0295
0.2825 0.8 4000 0.4558 42.6728 17.1703 38.8692 38.8841 12.246
0.2737 0.9 4500 0.4565 43.0036 16.8421 39.1761 39.1693 11.7975
0.2683 1.0 5000 0.4576 42.1341 16.7973 38.2881 38.3083 11.8655
0.1687 1.1 5500 0.4996 41.7152 16.4042 37.7724 37.7629 12.384
0.168 1.2 6000 0.5046 41.6521 16.6159 37.7915 37.7778 12.661
0.1688 1.3 6500 0.5020 42.3292 17.1408 38.5407 38.5282 11.846
0.1682 1.4 7000 0.5045 42.848 17.6905 38.9854 38.9896 12.025
0.1703 1.5 7500 0.5103 42.1175 16.7765 38.3023 38.3199 12.4315
0.1618 1.6 8000 0.5019 43.207 17.8145 39.3822 39.3884 12.3485
0.1657 1.7 8500 0.4945 42.8399 17.8975 39.1618 39.1951 11.8575
0.1643 1.8 9000 0.5064 43.0186 17.8969 39.2518 39.2735 12.0095
0.1654 1.9 9500 0.5011 43.2785 18.2603 39.4479 39.4437 12.2305
0.158 2.0 10000 0.4945 43.3824 18.3183 39.3471 39.3334 12.1495
0.1096 2.1 10500 0.5520 43.5068 18.4313 39.7084 39.7205 12.112
0.1037 2.2 11000 0.5510 43.1909 18.1204 39.1945 39.2052 12.349
0.1045 2.3 11500 0.5453 42.9965 18.4064 39.0931 39.0868 12.1825
0.1027 2.4 12000 0.5473 43.4973 18.8697 39.944 39.9407 12.447
0.1034 2.5 12500 0.5512 43.9534 19.327 40.0946 40.0724 12.2395
0.1018 2.6 13000 0.5527 43.7136 19.1214 39.9218 39.9274 12.3245
0.0986 2.7 13500 0.5557 44.0502 19.3213 40.0291 40.0286 12.3345
0.0953 2.8 14000 0.5510 44.0001 19.4482 40.1204 40.1175 12.1255
0.098 2.9 14500 0.5534 43.9554 19.4673 40.1401 40.1521 12.2395
0.0947 3.0 15000 0.5528 44.1624 19.6736 40.3898 40.4029 12.263

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
6
Safetensors
Model size
239M params
Tensor type
F32
ยท
Inference API
Inference API (serverless) does not yet support transformers models for this pipeline type.

Model tree for NourFakih/Vit-GPT2-COCO2017Flickr-40k-05

Finetuned
(9)
this model

Space using NourFakih/Vit-GPT2-COCO2017Flickr-40k-05 1