huggyllama/llama-65b
#1
by
KnutJaegersberg
- opened
the config file has this as file path which looks a little weird
huggyllama/llama-65b
LLM360/K2 would look better :)
KnutJaegersberg
changed discussion status to
closed
Hah yeah I agree, it looks funny.
Additionally, lack of gqa is an architectural choice that is puzzling to me.
the config file has this as file path which looks a little weird
huggyllama/llama-65b
Yeah thanks for spotting this. This is because we when we did a checkpoint conversion, we loaded a model and then modify it, loading our own weights etc.
fixing them now.
Hah yeah I agree, it looks funny.
Additionally, lack of gqa is an architectural choice that is puzzling to me.
Maybe not the best choice now I am looking at it. During our initial design, we tend to choose simple choices since our goal is to make research on these easier.