Vit-GPT2-COCO2017Flickr-40k-05

This model is a fine-tuned version of nlpconnect/vit-gpt2-image-captioning on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5528
Rouge1: 44.1624
Rouge2: 19.6736
Rougel: 40.3898
Rougelsum: 40.4029
Gen Len: 12.263

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
0.1497	0.1	500	0.5462	40.1774	14.6199	36.3335	36.3518	12.5965
0.1604	0.2	1000	0.5302	41.4714	16.0237	37.5992	37.5915	11.914
0.1631	0.3	1500	0.5436	40.3816	14.6958	36.6109	36.6027	12.3295
0.1634	0.4	2000	0.5266	40.9484	15.9068	37.5194	37.5088	12.033
0.1576	0.5	2500	0.5544	40.373	15.012	36.5218	36.5141	12.3345
0.1599	0.6	3000	0.5425	40.7552	15.2754	37.1059	37.1299	12.191
0.291	0.7	3500	0.4545	41.5934	16.251	37.7291	37.7113	12.0295
0.2825	0.8	4000	0.4558	42.6728	17.1703	38.8692	38.8841	12.246
0.2737	0.9	4500	0.4565	43.0036	16.8421	39.1761	39.1693	11.7975
0.2683	1.0	5000	0.4576	42.1341	16.7973	38.2881	38.3083	11.8655
0.1687	1.1	5500	0.4996	41.7152	16.4042	37.7724	37.7629	12.384
0.168	1.2	6000	0.5046	41.6521	16.6159	37.7915	37.7778	12.661
0.1688	1.3	6500	0.5020	42.3292	17.1408	38.5407	38.5282	11.846
0.1682	1.4	7000	0.5045	42.848	17.6905	38.9854	38.9896	12.025
0.1703	1.5	7500	0.5103	42.1175	16.7765	38.3023	38.3199	12.4315
0.1618	1.6	8000	0.5019	43.207	17.8145	39.3822	39.3884	12.3485
0.1657	1.7	8500	0.4945	42.8399	17.8975	39.1618	39.1951	11.8575
0.1643	1.8	9000	0.5064	43.0186	17.8969	39.2518	39.2735	12.0095
0.1654	1.9	9500	0.5011	43.2785	18.2603	39.4479	39.4437	12.2305
0.158	2.0	10000	0.4945	43.3824	18.3183	39.3471	39.3334	12.1495
0.1096	2.1	10500	0.5520	43.5068	18.4313	39.7084	39.7205	12.112
0.1037	2.2	11000	0.5510	43.1909	18.1204	39.1945	39.2052	12.349
0.1045	2.3	11500	0.5453	42.9965	18.4064	39.0931	39.0868	12.1825
0.1027	2.4	12000	0.5473	43.4973	18.8697	39.944	39.9407	12.447
0.1034	2.5	12500	0.5512	43.9534	19.327	40.0946	40.0724	12.2395
0.1018	2.6	13000	0.5527	43.7136	19.1214	39.9218	39.9274	12.3245
0.0986	2.7	13500	0.5557	44.0502	19.3213	40.0291	40.0286	12.3345
0.0953	2.8	14000	0.5510	44.0001	19.4482	40.1204	40.1175	12.1255
0.098	2.9	14500	0.5534	43.9554	19.4673	40.1401	40.1521	12.2395
0.0947	3.0	15000	0.5528	44.1624	19.6736	40.3898	40.4029	12.263

Framework versions

Transformers 4.39.3
Pytorch 2.1.2
Datasets 2.18.0
Tokenizers 0.15.2

NourFakih
/

Vit-GPT2-COCO2017Flickr-40k-05

Vit-GPT2-COCO2017Flickr-40k-05

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for NourFakih/Vit-GPT2-COCO2017Flickr-40k-05

Space using NourFakih/Vit-GPT2-COCO2017Flickr-40k-05 1

Evaluation results