--- base_model: - codellama/CodeLlama-7b-Python-hf - tokyotech-llm/Swallow-7b-instruct-hf tags: - mergekit - merge --- # merged-output This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). ## Merge Details ### Merge Method This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [tokyotech-llm/Swallow-7b-instruct-hf](https://huggingface.co/tokyotech-llm/Swallow-7b-instruct-hf) as a base. ### Models Merged The following models were included in the merge: * [codellama/CodeLlama-7b-Python-hf](https://huggingface.co/codellama/CodeLlama-7b-Python-hf) ### Configuration The following YAML configuration was used to produce this model: ```yaml models: - model: tokyotech-llm/Swallow-7b-instruct-hf # no parameters necessary for base model - model: codellama/CodeLlama-7b-Python-hf # follow user intent parameters: density: 1 weight: - filter: mlp.down_proj value: [0.3, 0.25, 0.25, 0.15, 0.1] - filter: mlp.gate_proj value: [0.7, 0.25, 0.5, 0.45, 0.4] - filter: mlp.up_proj value: [0.7, 0.25, 0.5, 0.45, 0.4] - filter: self_attn value: [0.7, 0.25, 0.5, 0.45, 0.4] - value: 0 # fallback for rest of tensors. merge_method: dare_ties base_model: tokyotech-llm/Swallow-7b-instruct-hf dtype: bfloat16 tokenizer_source: union ```