laion/CLIP-ViT-bigG-14-laion2B-39B-b160k · Correrct Transformers Pad Token

patrickvonplaten

Jun 23, 2023

No description provided.

correct tokenizer746a432a

patrickvonplaten

Jun 23, 2023

import open_clip

tokenizer = open_clip.get_tokenizer('ViT-bigG-14')

print(tokenizer("hello"))

gives:

tensor([[49406,  3306, 49407,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0]])

which means the padding token should be 0 not 49407.

This PR corrects the Hugging Face Transformers version so that it matches the open_clip tokenizer:

from transformers import CLIPTokenizer

tokenizer = CLIPTokenizer.from_pretrained("laion/CLIP-ViT-bigG-14-laion2B-39B-b160k")

print(tokenizer("hello", max_length=77, padding="max_length", truncation=True))

patrickvonplaten changed pull request status to open Jun 23, 2023

patrickvonplaten changed pull request title from Correct pad token tokenizer to Correrct Transformers Pad Token Jun 23, 2023

rwightman

LAION eV org Jun 23, 2023

@patrickvonplaten @julien-c it is indeed wrong, but as mentioned in slack, this probably means all HF Transformers based tokenizers for OpenCLIP AND probably the OpenAI originals are wrong as OpenCLIP Transformers tokenizer config was just copied from the openai/ ones on the hub. I can't merge as I'm not the owner, that's @mitchellw

rwightman

LAION eV org Jan 16

@patrickvonplaten so I have write access and can merge this now, is this still a desired change making it match original tokenizer or think people are relying on this behaviour?