--- library_name: transformers license: other base_model: deepseek-ai/DeepSeek-Coder-V2-Base pipeline_tag: text-generation license_name: deepseek-license license_link: LICENSE ---
API Platform | How to Use | License |
# DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence ## 1. Introduction We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K.In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found [here](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/supported_langs.txt). ## 2. Model Downloads We release the DeepSeek-Coder-V2 with 16B and 236B parameters based on the [DeepSeekMoE](https://arxiv.org/pdf/2401.06066) framework, which has actived parameters of only 2.4B and 21B , including base and instruct models, to the public.
## 5. How to run locally **Here, we provide some examples of how to use DeepSeek-Coder-V2-Lite model. If you want to utilize DeepSeek-Coder-V2 in BF16 format for inference, 80GB*8 GPUs are required.** ### Inference with Huggingface's Transformers You can directly employ [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference. #### Code Completion ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda() input_text = "#write a quick sort algorithm" inputs = tokenizer(input_text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_length=128) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` #### Code Insertion ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda() input_text = """<ļ½fimābeginļ½>def quick_sort(arr): if len(arr) <= 1: return arr pivot = arr[0] left = [] right = [] <ļ½fimāholeļ½> if arr[i] < pivot: left.append(arr[i]) else: right.append(arr[i]) return quick_sort(left) + [pivot] + quick_sort(right)<ļ½fimāendļ½>""" inputs = tokenizer(input_text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_length=128) print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):]) ``` #### Chat Completion ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda() messages=[ { 'role': 'user', 'content': "write a quick sort algorithm in python."} ] inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device) # tokenizer.eos_token_id is the id of <ļ½endāofāsentenceļ½> token outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id) print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)) ``` The complete chat template can be found within `tokenizer_config.json` located in the huggingface model repository. An example of chat template is as belows: ```bash <ļ½begināofāsentenceļ½>User: {user_message_1} Assistant: {assistant_message_1}<ļ½endāofāsentenceļ½>User: {user_message_2} Assistant: ``` You can also add an optional system message: ```bash <ļ½begināofāsentenceļ½>{system_message} User: {user_message_1} Assistant: {assistant_message_1}<ļ½endāofāsentenceļ½>User: {user_message_2} Assistant: ``` ### Inference with vLLM (recommended) To utilize [vLLM](https://github.com/vllm-project/vllm) for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650. ```python from transformers import AutoTokenizer from vllm import LLM, SamplingParams max_model_len, tp_size = 8192, 1 model_name = "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_name) llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True) sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id]) messages_list = [ [{"role": "user", "content": "Who are you?"}], [{"role": "user", "content": "write a quick sort algorithm in python."}], [{"role": "user", "content": "Write a piece of quicksort code in C++."}], ] prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list] outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params) generated_text = [output.outputs[0].text for output in outputs] print(generated_text) ``` ## 5. New Features ššš ### Function calling Function calling allows the model to call external tools to enhance its capabilities. Here is an example: ```python # Assume that `model` and `tokenizer` are loaded model.generation_config = GenerationConfig(do_sample=False, max_new_tokens=128, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.eos_token_id) tool_system_prompt = """You are a helpful Assistant. ## Tools ### Function You have the following functions available: - `get_current_weather`: ```json { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": [ "celsius", "fahrenheit" ] } }, "required": [ "location" ] } } ```""" tool_call_messages = [{"role": "system", "content": tool_system_prompt}, {"role": "user", "content": "What's the weather like in Tokyo and Paris?"}] tool_call_inputs = tokenizer.apply_chat_template(tool_call_messages, add_generation_prompt=True, return_tensors="pt") tool_call_outputs = model.generate(tool_call_inputs.to(model.device)) # Generated text: '<ļ½toolācallsābeginļ½><ļ½toolācallābeginļ½>function<ļ½toolāsepļ½>get_current_weather\n```json\n{"location": "Tokyo"}\n```<ļ½toolācallāendļ½>\n<ļ½toolācallābeginļ½>function<ļ½toolāsepļ½>get_current_weather\n```json\n{"location": "Paris"}\n```<ļ½toolācallāendļ½><ļ½toolācallsāendļ½><ļ½endāofāsentenceļ½>' # Mock response of calling `get_current_weather` tool_messages = [{"role": "tool", "content": '{"location": "Tokyo", "temperature": "10", "unit": null}'}, {"role": "tool", "content": '{"location": "Paris", "temperature": "22", "unit": null}'}] tool_inputs = tokenizer.apply_chat_template(tool_messages, add_generation_prompt=False, return_tensors="pt")[:, 1:] tool_inputs = torch.cat([tool_call_outputs, tool_inputs.to(model.device)], dim=1) tool_outputs = model.generate(tool_inputs) # Generated text: The current weather in Tokyo is 10 degrees, and in Paris, it is 22 degrees.<ļ½endāofāsentenceļ½> ``` ### JSON output You can use JSON Output Mode to ensure the model generates a valid JSON object. To active this mode, a special instruction should be appended to your system prompt. ```python # Assume that `model` and `tokenizer` are loaded model.generation_config = GenerationConfig(do_sample=False, max_new_tokens=128, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.eos_token_id) user_system_prompt = 'The user will provide some exam text. Please parse the "question" and "answer" and output them in JSON format.' json_system_prompt = f"""{user_system_prompt} ## Response Format Reply with JSON object ONLY.""" json_messages = [{"role": "system", "content": json_system_prompt}, {"role": "user", "content": "Which is the highest mountain in the world? Mount Everest."}] json_inputs = tokenizer.apply_chat_template(json_messages, add_generation_prompt=True, return_tensors="pt") json_outpus = model.generate(json_inputs.to(model.device)) # Generated text: '```json\n{\n "question": "Which is the highest mountain in the world?",\n "answer": "Mount Everest."\n}\n```<ļ½endāofāsentenceļ½>' ``` ### FIM completion In FIM (Fill In the Middle) completion, you can provide a prefix and an optional suffix, and the model will complete the content in between. ```python # Assume that `model` and `tokenizer` are loaded model.generation_config = GenerationConfig(do_sample=False, max_new_tokens=128, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.eos_token_id) prefix = """def quick_sort(arr): if len(arr) <= 1: return arr pivot = arr[0] left = [] right = [] """ suffix = """ if arr[i] < pivot: left.append(arr[i]) else: right.append(arr[i]) return quick_sort(left) + [pivot] + quick_sort(right)""" fim_prompt = f"<ļ½fimābeginļ½>{prefix}<ļ½fimāholeļ½>{suffix}<ļ½fimāendļ½>" fim_inputs = tokenizer(fim_prompt, add_special_tokens=True, return_tensors="pt").input_ids fim_outputs = model.generate(fim_inputs.to(model.device)) # Generated text: " for i in range(1, len(arr)):<ļ½endāofāsentenceļ½>" ``` ## 6. License This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/LICENSE-CODE). The use of DeepSeek-Coder-V2 Base/Instruct models is subject to [the Model License](https://github.com/deepseek-ai/DeepSeek-Coder-V2/blob/main/LICENSE-MODEL). DeepSeek-Coder-V2 series (including Base and Instruct) supports commercial use. ## 7. Contact If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).