BAAI/bge-m3 · convert token to the original form

Mar 15

Hello, i am wondering if it is possible to convert the token words to their original form, such as the token 'solu' can be reconverted into 'solution'

Shitao

Beijing Academy of Artificial Intelligence org Mar 17

Hi, transformers model uses tokens instead of words as input. Sometimes, a single word may be segmented into multiple tokens by the tokenizer.

DAIEF

Mar 20

Thank you very much! I would like to know if you think it would be useful to use your model to evaluate whether the generated/extracted keywords are of high quality. For example, for each document in the document list, I generated/extracted a list of keywords and then embedded all the keywords and documents using the colbert matrix (maybe I could use 2 other matrices as well). Finally, I start calculating the score for each keyword and each document, and if the document source has the highest score with the keywords, I can say that this document has the best keyword.

What do you think of this approach? I look forward to your insights.

Shitao

Beijing Academy of Artificial Intelligence org Mar 22

It seems like a possible approach. However, we haven't done this type of task before and cannot offer more advice.