OpenGVLab/InternViT-6B-448px-V1-5 as Zero Shot Image Classification.

#4
by iavinas - opened

Hi,

Thanks for sharing the model.

I am trying to using Vision Foundation Model for a zero shot classification problem.

It is possible with OpenGVLab/InternVL-14B-224px but I am not able to do with OpenGVLab/InternViT-6B-448px-V1-5.

model = AutoModel.from_pretrained('OpenGVLab/InternViT-6B-448px-V1-5', torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, trust_remote_code=True).cuda().eval()

tokenizer = AutoTokenizer.from_pretrained('OpenGVLab/InternViT-6B-448px-V1-5', use_fast=False, add_eos_token=True, trust_remote_code=True)

Is there anyway to get the tokenizer for OpenGVLab/InternViT-6B-448px-V1-5?

OpenGVLab org

Hi, the difficulty you're experiencing arises from the fact that OpenGVLab/InternViT-6B-448px-V1-5 is designed primarily as a vision encoder for building multimodal large language models, not for zero-shot image classification. Therefore, it doesn't have the same functionality as OpenGVLab/InternVL-14B-224px, which is a CLIP-like model suitable for zero-shot image classification tasks.

czczup changed discussion status to closed

Sign up or log in to comment