Update configuration_llama.py, required to get model to Load as `rope_scaling` needs to be None, or else a dictionary
Browse filesI checked how NousResearch YaRN models did it, and they have `rope_scaling=None` defined in this Python file. Then that will be overriden by the values specified in your config.json:
Before this PR I couldn't load the model, but this PR enables me to load it - below confirms that correct `rope_scaling` is then applied:
```
In [1]: from transformers import AutoModelForCausalLM, AutoTokenizer
In [2]: model = AutoModelForCausalLM.from_pretrained("/workspace/process/ddh0_norocetacean-20b-10k/source" , low_cpu_mem_usage=True, trust_remote_code=True)
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:08<00:00, 1.77s/it]
In [4]: model.config.rope_scaling
Out[4]:
{'factor': 2.5,
'original_max_position_embeddings': 4096,
'type': 'yarn',
'finetuned': False}
In [5]:
```
- configuration_llama.py +1 -1
@@ -124,7 +124,7 @@ class LlamaConfig(PretrainedConfig):
|
|
124 |
pretraining_tp=1,
|
125 |
tie_word_embeddings=False,
|
126 |
rope_theta=10000,
|
127 |
-
rope_scaling=
|
128 |
attention_bias=False,
|
129 |
**kwargs,
|
130 |
):
|
|
|
124 |
pretraining_tp=1,
|
125 |
tie_word_embeddings=False,
|
126 |
rope_theta=10000,
|
127 |
+
rope_scaling=None,
|
128 |
attention_bias=False,
|
129 |
**kwargs,
|
130 |
):
|