BAAI/bge-m3 · How to load with HF Transformers?

Feb 27

Hi, Thank you for your remarkable work!. I'm really impressed by the performance of this model.

For some reason, I want to load this model via Huggingface transformers (AutoModel.from_pretrinaed or something) not via FlagEmbdding.

Can I do so?

Shitao

Beijing Academy of Artificial Intelligence org Feb 27

Yes, you can load it in the same way with bge-1.5: https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/baai_general_embedding#using-huggingface-transformers

jhflow changed discussion status to closed Feb 27

jhflow

Feb 27

Thank you!

Calvinnncy97

Aug 29

How can I get dense, colbert embeddings with transformers?

Given

from transformers import AutoModel, AutoTokenizer
from torch import Tensor
import torch

model_path = 'BAAI/bge-m3'
model = AutoModel.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

test_sentence = ["this is a test sentence"]

batch_dict = tokenizer(test_sentence, return_tensors='pt', max_length=128, padding=True, truncation=True)
outputs = model(**batch_dict)

I get BaseModelOutputWithPoolingAndCrossAttentions with pooler_output and last_hidden_state keys. Is pooler_output the CLS embedding and last_hidden_state all the token embeddings?

Kindly clarify. Thank you.