convert token to the original form
Hello, i am wondering if it is possible to convert the token words to their original form, such as the token 'solu' can be reconverted into 'solution'
Hi, transformers model uses tokens instead of words as input. Sometimes, a single word may be segmented into multiple tokens by the tokenizer.
Thank you very much! I would like to know if you think it would be useful to use your model to evaluate whether the generated/extracted keywords are of high quality. For example, for each document in the document list, I generated/extracted a list of keywords and then embedded all the keywords and documents using the colbert matrix (maybe I could use 2 other matrices as well). Finally, I start calculating the score for each keyword and each document, and if the document source has the highest score with the keywords, I can say that this document has the best keyword.
What do you think of this approach? I look forward to your insights.
It seems like a possible approach. However, we haven't done this type of task before and cannot offer more advice.