gpt-omni commited on
Commit
a722089
β€’
1 Parent(s): 9896323

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -3
README.md CHANGED
@@ -1,3 +1,36 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model: Qwen/Qwen2-0.5B
6
+ ---
7
+
8
+
9
+ <p align="center"><strong style="font-size: 18px;">
10
+ Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
11
+ </strong>
12
+ </p>
13
+
14
+ <p align="center">
15
+ πŸ€— <a href="">Hugging Face</a> | πŸ“– <a href="https://github.com/gpt-omni/mini-omni">Github</a>
16
+ | πŸ“‘ <a href="https://arxiv.org/abs/2408.16725">Technical report</a>
17
+ </p>
18
+
19
+ Mini-Omni is an open-source multimodel large language model that can **hear, talk while thinking**. Featuring real-time end-to-end speech input and **streaming audio output** conversational capabilities.
20
+
21
+ <p align="center">
22
+ <img src="frameworkv3.jpg" width="100%"/>
23
+ </p>
24
+
25
+
26
+ ## Features
27
+
28
+ βœ… **Real-time speech-to-speech** conversational capabilities. No extra ASR or TTS models required.
29
+
30
+ βœ… **Talking while thinking**, with the ability to generate text and audio at the same time.
31
+
32
+ βœ… **Streaming audio outupt** capabilities.
33
+
34
+ βœ… With "Audio-to-Text" and "Audio-to-Audio" **batch inference** to further boost the performance.
35
+
36
+ **NOTE**: please refer to https://github.com/gpt-omni/mini-omni for more details.