Text Generation
Transformers
Safetensors
imp
custom_code

Trainable layers impossible to control in Vision Tower

#9
by edmond - opened

It seems that no matter what setup I chose, whatever layer in the Vision Tower I chose to be trainable like here :
for name, param in self.llm.named_parameters():
param.requires_grad = (('.ln.' in name.lower()
or 'norm' in name.lower() or
'transformer.h.0' in name.lower() or
'vision_model.encoder.layers.0.' in name.lower() or
'vision_model.encoder.layers.1.' in name.lower()) and
('out_proj' not in name.lower()))
it changes nothing, the training behavior is the same, why is that ?

(I printed which layers are trainable and everythink is fine, weirdly pytorch understood well my changes in the Vision Tower because the number of trainable weights changed, and I know I am doing it right because my changes in the trainable layers in the LLM part have an actual impact on the training/val loss and weights value)

We will try to figure out this question. And are you trying to finetune imp in your custom datasets?

Yes I am trying to do that, I did successfully, but the vision tower is resisting me.

Perhaps you can try to fine tune Imp in your custom dataset by the script we provided in imp github , and we will update the new phi2 version in github.

Ah okok sure, I was hoping I could just stick to my usual Hugginface model training, but I can try that too when I find some time

edmond changed discussion status to closed

I know, the vision tower must be extracted from the model to force the trainable weights.

Sign up or log in to comment