Image-Text-to-Text
Transformers
Safetensors
lora
Inference Endpoints
takarajordan commited on
Commit
7c1495b
1 Parent(s): a093459

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -1
README.md CHANGED
@@ -8,4 +8,33 @@ tags:
8
  - lora
9
  datasets:
10
  - Multimodal-Fatima/FGVC_Aircraft_train
11
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - lora
9
  datasets:
10
  - Multimodal-Fatima/FGVC_Aircraft_train
11
+ ---
12
+ # pixtral_aerial_VQA_adapter
13
+
14
+ ## Model Details
15
+
16
+ - **Type**: LoRA Adapter
17
+ - **Total Parameters**: 6,225,920
18
+ - **Memory Usage**: 23.75 MB
19
+ - **Precisions**: torch.float32
20
+ - **Layer Types**:
21
+ - lora_A: 40
22
+ - lora_B: 40
23
+
24
+ ## Intended Use
25
+
26
+ - **Primary intended uses**: Processing aerial footage of construction sites for structural and construction surveying.
27
+ - Can also be applied to any detailed VQA use cases with aerial footage.
28
+
29
+ ## Training Data
30
+
31
+ - **Dataset**:
32
+ 1. FloodNet Track 2 dataset
33
+ 2. Subset of FGVC Aircraft dataset
34
+ 3. Custom dataset of 10 image-caption pairs created using Pixtral
35
+
36
+ ## Training Procedure
37
+
38
+ - **Training method**: LoRA (Low-Rank Adaptation)
39
+ - **Base model**: Ertugrul/Pixtral-12B-Captioner-Relaxed
40
+ - **Training hardware**: Nebius-hosted NVIDIA H100 machine