SuperAGI
/

Veagle

Model card Files Files and versions Community

Veagle / README.md

abhaykondi

Update README.md

663b39b verified 10 months ago

preview code

raw

history blame

2.7 kB

	---
	license: apache-2.0
	---

	# Model Card


	Veagle significantly improves the textual understanding & interpretation of images. The unique feature of Veagle
	is in its architectural change along with a combination of different components: a vision abstractor from mPlugOwl,
	Q-Former from InstructBLIP, and the Mistral language model. This combination allows Veagle to better understand and
	interpret the connection between text and images achieving state-of-the-art results. Veagle starts with a pre-trained
	vision encoder and language model and is trained in two stages. This method helps the model effectively use information
	from images and text together.

	Further details about Veagle can be found in this detailed blog post: https://superagi.com/superagi-veagle/

	## Key Contributions

	- Veagle has surpassed most state-of-the-art (SOTA) models in major benchmarks, capable of outperforming competitors
	in various tasks and domains.
	- Using an optimized dataset, Veagle achieves high accuracy and efficiency. This demonstrates the model's effective
	learning from limited data. We meticulously curated a dataset of 3.5 million examples, specifically tailored to
	enhance visual representation learning.
	- Veagle's architecture is a unique blend of components, including a visionary abstractor inspired by mPlugOwl,
	the Q-Former module from InstructBLIP, and the powerful Mistral language model. This innovative architecture,
	complemented by an additional projectional layer and architectural refinements, empowers Veagle to excel in multimodal tasks.


	## Training

	- Trained by: SuperAGI Team
	- Hardware: NVIDIA 8 x A100 SxM (80GB)
	- LLM: Mistral 7B
	- Vision Encoder: mPLUG-OWL2
	- Duration of pretraining: 12 hours
	- Duration of finetuning: 25 hours
	- Number of epochs in pretraining: 3
	- Number of epochs in finetuning: 2
	- Batch size in pretraining: 8
	- Batch size in finetuning: 10
	- Learning Rate: 1e-5
	- Weight Decay: 0.05
	- Optmizer: AdamW

	## Steps to try
	```python
	1.Clone the repository
	git clone https://github.com/superagi/Veagle
	cd Veagle
	```

	```python
	2. Run installation script
	source venv/bin/activate
	chmod +x install.sh
	./install.sh
	```

	```python
	3. python evaluate.py --answer_qs \
	--model_name veagle_mistral \
	--img_path images/food.jpeg \
	--question "Is the food given in the image is healthy or not?"
	```

	## Evaluation

	![Image 18-01-24 at 3.39 PM.jpg](https://cdn-uploads.huggingface.co/production/uploads/65a8fe900dba6b99a0164a47/bBBFaYI6maW_DKci9nl6L.jpeg)


	## The SuperAGI team

	Rajat Chawla, Arkajit Dutta, Tushar Verma, Adarsh Jha, Anmol Gautam, Ayush vatsal,
	Sukrit Chatterjee, Mukunda NS, Ishaan Bhola