|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# Model Card |
|
|
|
|
|
Veagle significantly improves the textual understanding & interpretation of images. The unique feature of Veagle |
|
is in its architectural change along with a combination of different components: a vision abstractor from mPlugOwl, |
|
Q-Former from InstructBLIP, and the Mistral language model. This combination allows Veagle to better understand and |
|
interpret the connection between text and images achieving state-of-the-art results. Veagle starts with a pre-trained |
|
vision encoder and language model and is trained in two stages. This method helps the model effectively use information |
|
from images and text together. |
|
|
|
Further details about Veagle can be found in this detailed blog post: https://superagi.com/superagi-veagle/ |
|
|
|
## Key Contributions |
|
|
|
- Veagle has surpassed most state-of-the-art (SOTA) models in major benchmarks, capable of outperforming competitors |
|
in various tasks and domains. |
|
- Using an optimized dataset, Veagle achieves high accuracy and efficiency. This demonstrates the model's effective |
|
learning from limited data. We meticulously curated a dataset of 3.5 million examples, specifically tailored to |
|
enhance visual representation learning. |
|
- Veagle's architecture is a unique blend of components, including a visionary abstractor inspired by mPlugOwl, |
|
the Q-Former module from InstructBLIP, and the powerful Mistral language model. This innovative architecture, |
|
complemented by an additional projectional layer and architectural refinements, empowers Veagle to excel in multimodal tasks. |
|
|
|
|
|
## Training |
|
|
|
- Trained by: SuperAGI Team |
|
- Hardware: NVIDIA 8 x A100 SxM (80GB) |
|
- LLM: Mistral 7B |
|
- Vision Encoder: mPLUG-OWL2 |
|
- Duration of pretraining: 12 hours |
|
- Duration of finetuning: 25 hours |
|
- Number of epochs in pretraining: 3 |
|
- Number of epochs in finetuning: 2 |
|
- Batch size in pretraining: 8 |
|
- Batch size in finetuning: 10 |
|
- Learning Rate: 1e-5 |
|
- Weight Decay: 0.05 |
|
- Optmizer: AdamW |
|
|
|
## Steps to try |
|
```python |
|
1.Clone the repository |
|
git clone https://github.com/superagi/Veagle |
|
cd Veagle |
|
``` |
|
|
|
```python |
|
2. Run installation script |
|
source venv/bin/activate |
|
chmod +x install.sh |
|
./install.sh |
|
``` |
|
|
|
```python |
|
3. python evaluate.py --answer_qs \ |
|
--model_name veagle_mistral \ |
|
--img_path images/food.jpeg \ |
|
--question "Is the food given in the image is healthy or not?" |
|
``` |
|
|
|
## Evaluation |
|
|
|
![Image 18-01-24 at 3.39 PM.jpg](https://cdn-uploads.huggingface.co/production/uploads/65a8fe900dba6b99a0164a47/bBBFaYI6maW_DKci9nl6L.jpeg) |
|
|
|
|
|
## The SuperAGI team |
|
|
|
Rajat Chawla, Arkajit Dutta, Tushar Verma, Adarsh Jha, Anmol Gautam, Ayush vatsal, |
|
Sukrit Chatterjee, Mukunda NS, Ishaan Bhola |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|