can protgpt2 be used in "Fill-Mask" tasks?
Thanks for the excellent work!
I am wondering if that protgpt2 can be used in "Fill-Mask" tasks?
to be more specific, say I have a sequence:
"MTYKLIINGKTLKGETTTEAVDA"
now, i'd like to mutate T2 site, that is filling the T2 site blank with protgpt2.
"M ? YKLIINGKTLKGETTTEAVDA"
i have tied " pipeline('fill-mask', model="nferruz/ProtGPT2")"
got:
""fill-mask", self.model.base_model_prefix, "The tokenizer does not define a `mask_token"
this is my first time using a NLP model, sorry about the naive question.
thanks.
Hi Likun,
As it is, ProtGPT2 cannot be used in a fill-mask problem since it was trained with an autoregressive objective (predict next token). It could be done with some fine-tuning, but I haven’t done this yet.
For your problem, you could directly use a denoising autoencoding model, like ESM1 and ESM2, or ProtT5. They are many more, and they all publicly available. Let me know if you have questions if you give it a try!
Noelia
thanks! this info is really helpful!
Likun