Is it possible to decrease vocabulary size?
#50
by
omers66
- opened
Is there a way to access the tokenizer tokens distribution? I would like to decrease vocabulary size to speed up. Would be ideal if I could keep most request tokens.
Thanks
You need to modify the embedding layer and the language modeling head as well.
Yes, but what about the histogram of tokens? I would like to remove the un-common ones...
Thanks