czczup commited on
Commit
a86d552
1 Parent(s): 1f98fd5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -11
README.md CHANGED
@@ -12,38 +12,42 @@ pipeline_tag: visual-question-answering
12
 
13
  # Model Card for InternVL-Chat-V1.5
14
 
15
- \[[Paper](https://arxiv.org/abs/2312.14238)\] \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\]
 
 
 
 
 
 
 
 
 
16
 
17
  ## Model Details
18
- - **Model Type:** vision large language model, multimodal chatbot
19
  - **Model Stats:**
20
  - Architecture: [InternViT-6B-448px-V1-5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) + MLP + [InternLM2-Chat-20B](https://huggingface.co/internlm/internlm2-chat-20b)
 
21
  - Params: 25.5B
22
- - Image size: dynamic resolution, max to 40 tiles of 448 x 448 during inference.
23
- - Number of visual tokens: 256 * (number of tiles + 1)
24
 
25
  - **Training Strategy:**
26
  - Pretraining Stage
27
  - Learnable Component: ViT + MLP
28
- - Data: TODO
29
  - SFT Stage
30
  - Learnable Component: ViT + MLP + LLM
31
- - Data: TODO
32
 
33
 
34
  ## Model Usage
35
 
36
- We provide a minimum code example to run InternVL-Chat using only the `transformers` library.
37
 
38
  You also can use our [online demo](https://internvl.opengvlab.com/) for a quick experience of this model.
39
 
40
- Note: If you meet this error `ImportError: This modeling file requires the following packages that were not found in your environment: fastchat`, please run `pip install fschat`.
41
-
42
-
43
  ```python
44
  import json
45
  import os
46
- from internvl.model.internvl_chat import InternVLChatModel
47
  from transformers import AutoTokenizer, AutoModel
48
  from tqdm import tqdm
49
  import torch
 
12
 
13
  # Model Card for InternVL-Chat-V1.5
14
 
15
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/AjPIKaxKLZCbzQRrPELPB.webp" alt="Image Description" width="300" height="300">
16
+
17
+ \[[Paper](https://arxiv.org/abs/2312.14238)\] \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
18
+
19
+ | Model | Date | Download | Note |
20
+ | ----------------------- | ---------- | --------------------------------------------------------------------------- | ---------------------------------- |
21
+ | InternVL-Chat-V1.5 | 2024.04.18 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5) | support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc. (🔥new)|
22
+ | InternVL-Chat-V1.2-Plus | 2024.02.21 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus) | more SFT data and stronger |
23
+ | InternVL-Chat-V1.2 | 2024.02.11 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2) | scaling up LLM to 34B |
24
+ | InternVL-Chat-V1.1 | 2024.01.24 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1) | support Chinese and stronger OCR |
25
 
26
  ## Model Details
27
+ - **Model Type:** multimodal large language model (MLLM)
28
  - **Model Stats:**
29
  - Architecture: [InternViT-6B-448px-V1-5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) + MLP + [InternLM2-Chat-20B](https://huggingface.co/internlm/internlm2-chat-20b)
30
+ - Image size: dynamic resolution, max to 32 tiles of 448 x 448 (4K resolution) during inference.
31
  - Params: 25.5B
 
 
32
 
33
  - **Training Strategy:**
34
  - Pretraining Stage
35
  - Learnable Component: ViT + MLP
36
+ - Data: Please see our technical report.
37
  - SFT Stage
38
  - Learnable Component: ViT + MLP + LLM
39
+ - Data: Please see our technical report.
40
 
41
 
42
  ## Model Usage
43
 
44
+ We provide an example code to run InternVL-Chat-V1.2 using `transformers`.
45
 
46
  You also can use our [online demo](https://internvl.opengvlab.com/) for a quick experience of this model.
47
 
 
 
 
48
  ```python
49
  import json
50
  import os
 
51
  from transformers import AutoTokenizer, AutoModel
52
  from tqdm import tqdm
53
  import torch