metadata
license: mit
base_model:
- mistralai/Pixtral-12B-2409
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- lora
datasets:
- Multimodal-Fatima/FGVC_Aircraft_train
- takara-ai/FloodNet_2021-Track_2_Dataset_HF
pixtral_aerial_VQA_adapter
Model Details
- Type: LoRA Adapter
- Total Parameters: 6,225,920
- Memory Usage: 23.75 MB
- Precisions: torch.float32
- Layer Types:
- lora_A: 40
- lora_B: 40
Intended Use
- Primary intended uses: Processing aerial footage of construction sites for structural and construction surveying.
- Can also be applied to any detailed VQA use cases with aerial footage.
Training Data
- Dataset:
- FloodNet Track 2 dataset
- Subset of FGVC Aircraft dataset
- Custom dataset of 10 image-caption pairs created using Pixtral
Training Procedure
- Training method: LoRA (Low-Rank Adaptation)
- Base model: Ertugrul/Pixtral-12B-Captioner-Relaxed
- Training hardware: Nebius-hosted NVIDIA H100 machine
Citation
@misc{rahnemoonfar2020floodnet,
title={FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding},
author={Maryam Rahnemoonfar and Tashnim Chowdhury and Argho Sarkar and Debvrat Varshney and Masoud Yari and Robin Murphy},
year={2020},
eprint={2012.02951},
archivePrefix={arXiv},
primaryClass={cs.CV},
doi={10.48550/arXiv.2012.02951}
}