SkyWork commited on
Commit
2f8b822
1 Parent(s): 9e2369d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md CHANGED
@@ -1,3 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # SkyCode
2
 
3
  SkyCode是由奇点智源发布的多语言开源编程大模型,采用GPT3模型结构,使用海量的代码进行训练。支持Java, JavaScript, C, C++, Python, Go, shell等多种主流编程语言,并能理解中文注释。模型可以对代码进行补全,进行解题等操作,使您从编程中解放出来,专心于解决更大的问题。
 
1
+ # Brief introduction of SkyCode
2
+ SkyCode is a multi-language open source programming model released by Singularity-AI. It adopts the GPT3 model structure and uses a large amount of code for training. Support Java, JavaScript, C, C++, Python, Go, shell and other mainstream programming languages, and can understand Chinese comments. The model can complete the code, solve problems and other operations, freeing you from programming and focusing on solving larger problems.
3
+
4
+ ## Project Highlights
5
+ 1. Technical advantage 1: covering multiple programming languages
6
+
7
+ Different programming languages focus on solving problems in different platforms and environments, and different programming languages have their own reasons for existence. The codes that Singularity-AI SkyCode can generate not only include widely used JavaScript, python, Java, C, etc., but also cover more than ten programming languages such as php, go, and swift, so that users of different languages can experience SkyCode has powerful code generation capabilities.
8
+
9
+ 2. Technical advantage 2: optimize for Chinese annotations
10
+
11
+ In the field of pre-training large models, it has always been dominated by the English community. The code generation model based on GPT3 has the same problem. Relying on the experience of deeply cultivating Chinese models, Singularity-AI optimized and innovated a unique Chinese encoding method according to the characteristics of Chinese, which is more in line with Chinese language habits, making the model's ability to understand Chinese annotations better.
12
+
13
+ 3. Technical advantage 3: excellent problem-solving ability
14
+
15
+ On the HumanEval data set that reflects the problem-solving ability of the code generation model, the problem-solving ability of SkyCode is also much higher than that of other open source models.
16
+ | model | pass@1 | pass@10 | pass@100 |
17
+ |:-------------- | ------:|:-------:| -------- |
18
+ | GPT-Neo 1.3B | 4.79% | 7.47% | 16.30% |
19
+ | GPT-Neo 2.7B | 6.41% | 11.27% | 21.37% |
20
+ | GPT-J 6B | 11.62% | 15.74% | 27.74% |
21
+ | SKY_code(2.6B) | 12.84% | 21.07% | 35.97% |
22
+ It can be seen that SkyCode with a parameter quantity of 2.6B is not only much higher than the GPT-Neo 1.3B model with fewer parameters, but also much higher than the GPT-Neo 2.7B model with a comparable parameter quantity. Even compared to the GPT-J 6B model with a higher number of parameters, SkyCode's problem-solving ability is stronger. In the pass@100 indicator that better reflects the upper limit of problem-solving ability, SkyCode's net value exceeds GPT-J by 8.23%.
23
+
24
+ # News of Singularity-AI
25
+ - [2022.12.15] [AIGC Press Conference of Singularity-AI](https://live.vhall.com/v3/lives/subscribe/697547540)
26
+
27
+
28
+ ## Reliance
29
+ ```
30
+ Recommend:
31
+ transformers>=4.18.0
32
+ ```
33
+
34
+ ## Model usage
35
+ ```python
36
+ # -*- coding: utf-8 -*-
37
+ from transformers import GPT2LMHeadModel
38
+ from transformers import AutoTokenizer
39
+ from transformers import TextGenerationPipeline
40
+
41
+ model = GPT2LMHeadModel.from_pretrained("SkyWork/SkyCode")
42
+ tokenizer = AutoTokenizer.from_pretrained("SkyWork/SkyCode", trust_remote_code=True)
43
+ text_generator = TextGenerationPipeline(model, tokenizer, device=0)
44
+ input_str = "if __name__"
45
+ max_new_tokens = 40
46
+ print(text_generator(input_str, max_new_tokens=max_new_tokens, do_sample=True))###
47
+ ```
48
+
49
+
50
+ # Licence
51
+ [MIT License](LICENSE)
52
+
53
+ ——————————————————————————————————————————————————————————————————————————————
54
+
55
  # SkyCode
56
 
57
  SkyCode是由奇点智源发布的多语言开源编程大模型,采用GPT3模型结构,使用海量的代码进行训练。支持Java, JavaScript, C, C++, Python, Go, shell等多种主流编程语言,并能理解中文注释。模型可以对代码进行补全,进行解题等操作,使您从编程中解放出来,专心于解决更大的问题。