Why is the size of pruned model bigger than the original ones after 24 layers been sliced?

by iheardyoulooking - opened Apr 11

Discussion

iheardyoulooking

Apr 11

Usually after structured pruning the model size should be smaller. but
the original one: 15GB
sliced one: 20GB+

thomasgauthier

Arcee AI org Apr 11

•

edited Apr 11

@iheardyoulooking it's because the model has been uploaded in 32 bit float format where the original Mistral is bfloat16. That makes each param in the sliced version twice as big on disk

You can still load the model in 16 bit by passing a torch_dtypeargument

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer')
model = AutoModelForCausalLM.from_pretrained(
    'arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer',
    torch_dtype=torch.bfloat16
)