simonJJJ commited on
Commit
eb9e7e6
1 Parent(s): 99bbed2
Files changed (1) hide show
  1. README.md +0 -6
README.md CHANGED
@@ -30,12 +30,6 @@ inference: false
30
  - **First generalist model support grounding in Chinese**: Detecting bounding boxes through open-domain language expression in both Chinese and English.
31
  - **Fine-grained recognization and understanding**: Compared to the 224 resolution currently used by other open-source LVLM, the 448 resolution promotes fine-grained text recognition, document QA, and bounding box annotation.
32
 
33
- <br>
34
- <p align="center">
35
- <img src="assets/demo_vl.gif" width="400"/>
36
- <p>
37
- <br>
38
-
39
  We release two models of the Qwen-VL series:
40
  - Qwen-VL: The pre-trained LVLM model uses Qwen-7B as the initialization of the LLM, and [Openclip ViT-bigG](https://github.com/mlfoundations/open_clip) as the initialization of the visual encoder. And connects them with a randomly initialized cross-attention layer. Qwen-VL was trained on about 1.5B image-text paired data. The final image input resolution is 448.
41
  - Qwen-VL-Chat: A multimodal LLM-based AI assistant, which is trained with alignment techniques.
 
30
  - **First generalist model support grounding in Chinese**: Detecting bounding boxes through open-domain language expression in both Chinese and English.
31
  - **Fine-grained recognization and understanding**: Compared to the 224 resolution currently used by other open-source LVLM, the 448 resolution promotes fine-grained text recognition, document QA, and bounding box annotation.
32
 
 
 
 
 
 
 
33
  We release two models of the Qwen-VL series:
34
  - Qwen-VL: The pre-trained LVLM model uses Qwen-7B as the initialization of the LLM, and [Openclip ViT-bigG](https://github.com/mlfoundations/open_clip) as the initialization of the visual encoder. And connects them with a randomly initialized cross-attention layer. Qwen-VL was trained on about 1.5B image-text paired data. The final image input resolution is 448.
35
  - Qwen-VL-Chat: A multimodal LLM-based AI assistant, which is trained with alignment techniques.