LLaVA-3D
Table of Contents
Model Summary
The LLaVA-3D model is a 7B parameter models trained on LLaVA-3D-Instruct-1M, based on LLaVA-v1.5-7B.
- Repository: ZCMax/LLaVA-3D
- Project Website: zcmax.github.io/projects/LLaVA-3D
- Paper: LLaVA-3D
- Point of Contact: Chenming Zhu
- Languages: English
Use
Intended use
The model was trained on LLaVA-3D-Instruct-1M and has the ability to interact with the single image for 2D tasks and posed RBG-D images for 3D tasks.
Feel free to share your generations in the Community tab!
Training
Model
- Pretraining Stage: scene-level and region-level caption data, 1 epoch, projector
- Instructing Tuning Stage: A mixture of 1M high-quality 2D and 3D data, 1 epoch, full model
- Precision: bfloat16
Hardware & Software
- GPUs: 8 * Nvidia Tesla A100 (for whole model series training)
- Orchestration: Huggingface Trainer
- Neural networks: PyTorch
Citation
@article{zhu2024llava,
title={LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness},
author={Zhu, Chenming and Wang, Tai and Zhang, Wenwei and Pang, Jiangmiao and Liu, Xihui},
journal={arXiv preprint arXiv:2409.18125},
year={2024}
}
- Downloads last month
- 1,271
Model tree for ChaimZhu/LLaVA-3D-7B
Base model
liuhaotian/llava-v1.5-7b