takara-ai
/

pixtral_aerial_VQA_adapter

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

takarajordan commited on 27 days ago

Commit

7c1495b

•

1 Parent(s): a093459

Update README.md

Files changed (1) hide show

README.md +30 -1

README.md CHANGED Viewed

@@ -8,4 +8,33 @@ tags:
 - lora
 datasets:
 - Multimodal-Fatima/FGVC_Aircraft_train
----

 - lora
 datasets:
 - Multimodal-Fatima/FGVC_Aircraft_train
+---
+# pixtral_aerial_VQA_adapter
+## Model Details
+- **Type**: LoRA Adapter
+- **Total Parameters**: 6,225,920
+- **Memory Usage**: 23.75 MB
+- **Precisions**: torch.float32
+- **Layer Types**:
+  - lora_A: 40
+  - lora_B: 40
+## Intended Use
+- **Primary intended uses**: Processing aerial footage of construction sites for structural and construction surveying.
+- Can also be applied to any detailed VQA use cases with aerial footage.
+## Training Data
+- **Dataset**:
+  1. FloodNet Track 2 dataset
+  2. Subset of FGVC Aircraft dataset
+  3. Custom dataset of 10 image-caption pairs created using Pixtral
+## Training Procedure
+- **Training method**: LoRA (Low-Rank Adaptation)
+- **Base model**: Ertugrul/Pixtral-12B-Captioner-Relaxed
+- **Training hardware**: Nebius-hosted NVIDIA H100 machine