Embeddings generated are different between GitHub repo and Hugging Face models

#4
by Palaash - opened

Hi,

I recently observed that the embeddings generated by the models provided in the GitHub repo and the Hugging Face models are different for the same input sequence.

In the Hugging Face model, embeddings are generated after layer normalization (for example, layer 23 for my model). However, in the model from the GitHub repo, embeddings are generated before layer normalization.

This discrepancy might lead to differences in the downstream applications. I'd appreciate it if you could clarify this issue and suggest a way to make both models consistent in terms of the generated embeddings.

Thank you!

InstaDeep Ltd org

Hello Palaash,

The model has 24 transformer layers.

If you look at the hidden_states retrieved after inference with the HuggingFace nucleotide-transformer-500m-human-ref, you'll see that it is a tuple of 25 tensors. That is one per layer, and the final one being extracted after the first layer norm of the Robert LM head.

If you use the GitHub JAX repository, you can also retrieve 25 different embeddings. The layers are 0-indexed, so if you are calling get_pretrained_model with the argument embeddings_layers_to_save=(23,), you will get the embeddings after the last transformer layer of the model, which correspond to the second last embeddings in Huggingface's hidden_states. However, if you call embeddings_layers_to_save=(24,), you will get the embeddings after the first layer norm of the Roberta LM head, which correspong to the last embeddings returned in Huggingface's hidden_states.

I will add some indication in the nucleotide-transformer GitHub repository to clarify how to retrieve the embeddings in the same way as in the HuggingFace model.

Hope that solves everything on your end !

Sign up or log in to comment