NeMo
Safetensors
llama

Error in readme?

#6
by CHNtentes - opened

"specifically, we prune model embedding size, number of attention heads, and MLP intermediate dimension"

However, the number of attention heads are both 32 for this model and Llama 3.1. Same for kv heads.

NVIDIA org

Good catch, fixed.

srvm changed discussion status to closed

Sign up or log in to comment