Error in readme?

by CHNtentes - opened Aug 20

Aug 20

"specifically, we prune model embedding size, number of attention heads, and MLP intermediate dimension"

However, the number of attention heads are both 32 for this model and Llama 3.1. Same for kv heads.

srvm

NVIDIA org Aug 20

Good catch, fixed.

srvm changed discussion status to closed Aug 20

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment