abhaykondi commited on
Commit
0dffa44
1 Parent(s): 46e34b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md CHANGED
@@ -2,8 +2,77 @@
2
  license: apache-2.0
3
  ---
4
 
 
5
 
6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
 
9
 
 
2
  license: apache-2.0
3
  ---
4
 
5
+ # Model Card
6
 
7
 
8
+ Veagle significantly improves the textual understanding & interpretation of images. The unique feature of Veagle
9
+ is in its architectural change along with a combination of different components: a vision abstractor from mPlugOwl,
10
+ Q-Former from InstructBLIP, and the Mistral language model. This combination allows Veagle to better understand and
11
+ interpret the connection between text and images achieving state-of-the-art results. Veagle starts with a pre-trained
12
+ vision encoder and language model and is trained in two stages. This method helps the model effectively use information
13
+ from images and text together.
14
+
15
+ Further details about Veagle can be found in this detailed blog post: https://superagi.com/superagi-veagle/
16
+
17
+ ## Key Contributions
18
+
19
+ - Veagle has surpassed most state-of-the-art (SOTA) models in major benchmarks, capable of outperforming competitors
20
+ in various tasks and domains.
21
+ - Using an optimized dataset, Veagle achieves high accuracy and efficiency. This demonstrates the model's effective
22
+ learning from limited data. We meticulously curated a dataset of 3.5 million examples, specifically tailored to
23
+ enhance visual representation learning.
24
+ - Veagle's architecture is a unique blend of components, including a visionary abstractor inspired by mPlugOwl,
25
+ the Q-Former module from InstructBLIP, and the powerful Mistral language model. This innovative architecture,
26
+ complemented by an additional projectional layer and architectural refinements, empowers Veagle to excel in multimodal tasks.
27
+
28
+
29
+ ## Training
30
+
31
+ - Trained by: SuperAGI Team
32
+ - Hardware: NVIDIA 8 x A100 SxM (80GB)
33
+ - LLM: Mistral 7B
34
+ - Vision Encoder: mPLUG-OWL2
35
+ - Duration of pretraining: 12 hours
36
+ - Duration of finetuning: 25 hours
37
+ - Number of epochs in pretraining: 3
38
+ - Number of epochs in finetuning: 2
39
+ - Batch size in pretraining: 8
40
+ - Batch size in finetuning: 10
41
+ - Learning Rate: 1e-5
42
+ - Weight Decay: 0.05
43
+ - Optmizer: AdamW
44
+
45
+ ## Steps to try
46
+ ```python
47
+ 1.Clone the repository
48
+ git clone https://github.com/superagi/Veagle
49
+ cd Veagle
50
+ ```
51
+
52
+ ```python
53
+ 2. Run installation script
54
+ source venv/bin/activate
55
+ chmod +x install.sh
56
+ ./install.sh
57
+ ```
58
+
59
+ ```python
60
+ 3. python evaluate.py --answer_qs \
61
+ --model_name veagle_mistral \
62
+ --img_path images/food.jpeg \
63
+ --question "Is the food given in the image is healthy or not?"
64
+ ```
65
+
66
+ ## Evaluation
67
+
68
+ ![Image 18-01-24 at 3.39 PM.jpg](https://cdn-uploads.huggingface.co/production/uploads/65a8fe900dba6b99a0164a47/bBBFaYI6maW_DKci9nl6L.jpeg)
69
+
70
+
71
+ ## The SuperAGI team
72
+
73
+ Rajat Chawla, Arkajith Dutta, Tushar Jha, Anmol Gautam, Ayush vatsal,
74
+ Sukrit Chatterji, Adarsh Jha, Mukunda NS, Ishaan Bhola
75
+
76
 
77
 
78