image-captioning-Vit-GPT2-Flickr8k

This model is a fine-tuned version of nlpconnect/vit-gpt2-image-captioning on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.4624
Rouge1: 38.4609
Rouge2: 14.1268
Rougel: 35.4304
Rougelsum: 35.391
Gen Len: 12.1355

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
0.5495	0.06	500	0.4942	35.0812	11.7357	32.4228	32.4251	11.5738
0.4945	0.12	1000	0.4903	35.4943	12.0207	32.8571	32.8486	11.8682
0.4984	0.19	1500	0.4862	35.3652	11.9707	32.8296	32.8126	12.0544
0.4783	0.25	2000	0.4808	36.1048	12.3597	33.4635	33.4504	11.3468
0.4736	0.31	2500	0.4772	35.9342	12.343	33.519	33.495	11.1066
0.4685	0.37	3000	0.4708	36.8985	13.0743	34.3294	34.2978	11.4739
0.4687	0.43	3500	0.4704	36.1934	12.5721	33.4731	33.4671	11.9201
0.4709	0.49	4000	0.4696	36.1822	12.8306	33.4001	33.3673	12.1733
0.4575	0.56	4500	0.4675	37.4471	13.7553	34.5655	34.5384	12.6302
0.4484	0.62	5000	0.4662	36.6786	13.0601	33.9348	33.8999	12.6007
0.4507	0.68	5500	0.4656	36.506	12.7992	34.0665	34.0409	11.4316
0.4445	0.74	6000	0.4628	37.0737	13.3324	34.416	34.3902	12.3211
0.4557	0.8	6500	0.4594	37.3349	13.1633	34.4709	34.4503	12.2522
0.4451	0.87	7000	0.4600	37.3384	13.5699	34.6726	34.6555	12.0494
0.4381	0.93	7500	0.4588	37.6164	13.7855	34.8467	34.8084	12.1347
0.4357	0.99	8000	0.4571	37.2047	13.4341	34.3383	34.3121	12.2670
0.3869	1.05	8500	0.4612	37.684	13.6922	34.9914	34.9721	11.3216
0.377	1.11	9000	0.4616	37.2615	13.2059	34.3375	34.3327	12.3221
0.3736	1.17	9500	0.4607	37.2109	13.1387	34.3923	34.3638	11.8274
0.3801	1.24	10000	0.4617	38.0033	13.7561	35.2434	35.2414	11.6079
0.3816	1.3	10500	0.4599	37.3453	13.622	34.6495	34.639	12.2101
0.377	1.36	11000	0.4619	37.2996	13.4583	34.3777	34.3525	12.3911
0.3745	1.42	11500	0.4604	37.5448	13.3841	34.5785	34.5532	12.2747
0.3785	1.48	12000	0.4568	38.0769	14.0089	35.0744	35.0605	12.3179
0.3675	1.54	12500	0.4587	37.6284	13.8277	34.7837	34.7618	11.8732
0.3731	1.61	13000	0.4554	38.433	14.1461	35.6757	35.6683	11.4294
0.3731	1.67	13500	0.4548	37.9065	13.7526	34.9091	34.8919	12.1241
0.371	1.73	14000	0.4542	38.4064	14.2136	35.4845	35.4671	12.1014
0.3615	1.79	14500	0.4551	38.0695	14.1042	35.162	35.1427	12.1135
0.3687	1.85	15000	0.4550	38.1978	14.1243	35.3107	35.2821	12.2255
0.3711	1.92	15500	0.4532	37.661	13.603	34.7601	34.7467	12.1632
0.3685	1.98	16000	0.4515	38.5727	14.5345	35.5855	35.5585	11.9162
0.3333	2.04	16500	0.4626	38.4657	14.4726	35.6431	35.6119	11.9506
0.3129	2.1	17000	0.4660	38.2002	14.0689	35.1851	35.1748	12.3313
0.3155	2.16	17500	0.4674	37.8919	13.91	34.9167	34.9154	12.4853
0.3134	2.22	18000	0.4644	38.1576	13.9371	35.0486	35.0252	11.9748
0.3167	2.29	18500	0.4653	37.8516	13.9029	34.7959	34.7847	12.5273
0.322	2.35	19000	0.4673	37.9883	14.0127	34.8667	34.841	12.4680
0.312	2.41	19500	0.4641	38.4611	14.238	35.4465	35.417	11.9315
0.3173	2.47	20000	0.4654	38.1477	13.9164	35.1148	35.0905	12.4845
0.3081	2.53	20500	0.4640	38.7153	14.3282	35.7048	35.6923	11.8932
0.3093	2.6	21000	0.4633	38.2932	14.0961	35.2736	35.2308	11.8932
0.3154	2.66	21500	0.4637	38.0708	13.7374	35.0722	35.055	12.1310
0.3096	2.72	22000	0.4630	38.3722	14.041	35.2847	35.2425	12.2591
0.3101	2.78	22500	0.4627	38.6372	14.2961	35.5118	35.4819	12.2836
0.309	2.84	23000	0.4620	38.3596	14.0396	35.3285	35.3	12.3281
0.312	2.9	23500	0.4623	38.4268	14.0768	35.4015	35.3656	12.2208
0.3135	2.97	24000	0.4624	38.4609	14.1268	35.4304	35.391	12.1355

Framework versions

Transformers 4.39.3
Pytorch 2.1.2
Datasets 2.18.0
Tokenizers 0.15.2

NourFakih
/

image-captioning-Vit-GPT2-Flickr8k

image-captioning-Vit-GPT2-Flickr8k

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for NourFakih/image-captioning-Vit-GPT2-Flickr8k

Evaluation results