`max_position_embeddings` in config.json
#1
by
naive-puzzle
- opened
Why has the max_position_embeddings
parameter in the configuration file been altered to 4096? Such a modification appears to render the sliding window attention in Mistral-7b ineffective.
Is there a particular rationale behind this adjustment?
stabilityai/japanese-stablelm-base-gamma-7b
"intermediate_size": 14336,
"max_position_embeddings": 4096,
"model_type": "mistral",
mistralai/Mistral-7B-v0.1
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
btw, if the text you used during training comprised tokens fewer than or equal to 4096, then it seems that the max_position_embeddings
would not affect the weights of the model post-training at all.
def forward(self, x, seq_len=None):
# x: [bs, num_attention_heads, seq_len, head_size]
if seq_len > self.max_seq_len_cached:
self._set_cos_sin_cache(seq_len=seq_len, device=x.device, dtype=x.dtype)
return (
self.cos_cached[:seq_len].to(dtype=x.dtype),
self.sin_cached[:seq_len].to(dtype=x.dtype),
)