discrepancy between number of features
I'm sorry if I open a new Discussion but I'm sure it may help.
'''
model = timm.create_model('timm/mobilenetv4_conv_small.e2400_r224_in1k', pretrained=True)
print(model.num_features)
'''
Once you enter this couple of instructions, the number of features printed will be 960, but the last Linear convolutional layer will have 'in_features=1280' (which is also the default value, as you might see from the MobileNetV3 Class implementation on Github).
However, I can't figure out why the printed num_features and the effective number of features before the classification head don't correspond.
Thank you all!
@Elsospi mnv4 is like mnv3, it has a linear layer after gobal pool, that is considered part of the head so it's a bit different from other CNN
So num_features
matches features of forward_features()
which is a spatial feature map
head_hidden_size
is the pooled features after the last (EDIT: last meaning the last one before the classifier, the penultimate) linear layer in the head
you need to use forward_head(pre_logits=True), or set num_classes=0 / reset_classifier to get those pre_logits features.
Thank you so much for the answers, both here and on the other question, it clearer now.
Just saying, once you cut the classification head by using "num_classes=0", and you wanna customise the head (specifically in case you want to use it as a backbone for the siamese neural network with a parametrised embedding_size), you have to take into account that last linear layer, so the number of features you will have to work with is 1280 rather than 960.
Example:
self.base_model = base_model # MNV4 with num_classes=0
self.flatten = nn.Flatten()
self.fc = nn.Linear(1280, embedding_size) # !!!
self.l2_norm = nn.functional.normalize
@Elsospi yes, using the 'generic' model interface that will work for all models this is the case. But knowing the model structure you can modify to remove that model.conv_head = nn.Identity(), model.conv_norm = nn.Identity() (if this one exists).
You can also call forward_features(), get the unpooled output at 960 channels, and then pool to your liking in a custom head.