Why is the size of pruned model bigger than the original ones after 24 layers been sliced?
#1
by
iheardyoulooking
- opened
Usually after structured pruning the model size should be smaller. but
the original one: 15GB
sliced one: 20GB+
@iheardyoulooking it's because the model has been uploaded in 32 bit float format where the original Mistral is bfloat16. That makes each param in the sliced version twice as big on disk
You can still load the model in 16 bit by passing a torch_dtype
argument
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer')
model = AutoModelForCausalLM.from_pretrained(
'arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer',
torch_dtype=torch.bfloat16
)
Shamane
changed discussion status to
closed
Shamane
changed discussion status to
open
No description provided.
Shamane
changed discussion status to
closed