Update README.md with benchmark results (#2)
Browse files- Update README.md with benchmark results (01424c4ff768ab5c93fd15cca763be316a72d344)
Co-authored-by: Bowen Li <[email protected]>
README.md
CHANGED
@@ -36,16 +36,34 @@ InternLM2.5 has open-sourced a 7 billion parameter base model and a chat model t
|
|
36 |
|
37 |
- **Outstanding reasoning capability**: State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B.
|
38 |
|
39 |
-
- **1M Context window**: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench. Try it with [LMDeploy](https://github.com/InternLM/InternLM/blob/main/chat/lmdeploy.md) for 1M-context inference.
|
40 |
|
41 |
- **Stronger tool use**: InternLM2.5 supports gathering information from more than 100 web pages, corresponding implementation will be released in [Lagent](https://github.com/InternLM/lagent/tree/main) soon. InternLM2.5 has better tool utilization-related capabilities in instruction following, tool selection and reflection. See [examples](https://github.com/InternLM/InternLM/blob/main/agent/lagent.md).
|
42 |
|
43 |
## InternLM2.5-7B-Chat-1M
|
44 |
|
45 |
-
InternLM2.5-7B-Chat-1M is the 1M-long-context version of InternLM2.5-7B-Chat.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
46 |
|
47 |
### LMDeploy
|
48 |
|
|
|
|
|
|
|
49 |
LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
|
50 |
|
51 |
Here is an example of 1M-long context inference. **Note: 1M context length requires 4xA100-80G!**
|
|
|
36 |
|
37 |
- **Outstanding reasoning capability**: State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B.
|
38 |
|
39 |
+
- **1M Context window**: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench. Try it with [LMDeploy](https://github.com/InternLM/InternLM/blob/main/chat/lmdeploy.md) for 1M-context inference and a [file chat demo](https://github.com/InternLM/InternLM/tree/main/long_context).
|
40 |
|
41 |
- **Stronger tool use**: InternLM2.5 supports gathering information from more than 100 web pages, corresponding implementation will be released in [Lagent](https://github.com/InternLM/lagent/tree/main) soon. InternLM2.5 has better tool utilization-related capabilities in instruction following, tool selection and reflection. See [examples](https://github.com/InternLM/InternLM/blob/main/agent/lagent.md).
|
42 |
|
43 |
## InternLM2.5-7B-Chat-1M
|
44 |
|
45 |
+
InternLM2.5-7B-Chat-1M is the 1M-long-context version of InternLM2.5-7B-Chat.
|
46 |
+
|
47 |
+
### Performance Evaluation
|
48 |
+
|
49 |
+
We employed the "*needle in a haystack approach*" to evaluate the model's ability to retrieve information from long texts. Results show that InternLM2.5-7B-Chat-1M can accurately locate key information in documents up to 1M tokens in length.
|
50 |
+
|
51 |
+
<p align="center">
|
52 |
+
<img src="https://github.com/libowen2121/InternLM/assets/19970308/2ce3745f-26f5-4a39-bdcd-2075790d7b1d" alt="drawing" width="700"/>
|
53 |
+
</p>
|
54 |
+
|
55 |
+
We also used the [LongBench](https://github.com/THUDM/LongBench) benchmark to assess long-document comprehension capabilities. Our model achieved optimal performance in these tests.
|
56 |
+
|
57 |
+
<p align="center">
|
58 |
+
<img src="https://github.com/libowen2121/InternLM/assets/19970308/1e8f7da8-8193-4def-8b06-0550bab6a12f" alt="drawing" width="800"/>
|
59 |
+
</p>
|
60 |
+
|
61 |
|
62 |
### LMDeploy
|
63 |
|
64 |
+
Since huggingface Transformers does not directly support inference with 1M-long context, we recommand to use LMDeploy. The conventional usage with huggingface Transformers is also shown below.
|
65 |
+
|
66 |
+
|
67 |
LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
|
68 |
|
69 |
Here is an example of 1M-long context inference. **Note: 1M context length requires 4xA100-80G!**
|