Is it possible to only input text in LLaVa model?

#38
by Tizzzzy - opened

Hi,
Currently I can successful do image question answering with LLaVa model with the following code:

processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf")
model = AutoModelForImageTextToText.from_pretrained("llava-hf/llava-1.5-7b-hf", device_map="auto")

def llava_describe(image):
    question = "<image> Describe this image as detail as possible."
    inputs = processor(images=image, text=question, return_tensors="pt").to(model.device)
    generated_ids = model.generate(**inputs, max_new_tokens=200)
    answer = processor.decode(generated_ids[0][2:], skip_special_tokens=True)

I also want to only input text in the model. However, my code doesn't work:

def llava_describe(image):
    question = "..."
    inputs = processor(images=None, text=question, return_tensors="pt").to(model.device)
    generated_ids = model.generate(**inputs, max_new_tokens=200)
    answer = processor.decode(generated_ids[0][2:], skip_special_tokens=True)

I am keep getting this error:

Traceback (most recent call last):
  File "/workspace/llava/model.py", line 138, in <module>
    generated_text = llava_describe(image)
  File "/workspace/llava/model.py", line 48, in llava_describe
    generated_ids = model.generate(**inputs, max_new_tokens=200)
  File "/opt/conda/envs/llava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 2215, in generate
    result = self._sample(
  File "/opt/conda/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 3206, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/opt/conda/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/llava/lib/python3.10/site-packages/transformers/models/llava/modeling_llava.py", line 487, in forward
    inputs_embeds, attention_mask, labels, position_ids = self._merge_input_ids_with_image_features(
  File "/opt/conda/envs/llava/lib/python3.10/site-packages/transformers/models/llava/modeling_llava.py", line 303, in _merge_input_ids_with_image_features
    num_images, num_image_patches, embed_dim = image_features.shape
AttributeError: 'NoneType' object has no attribute 'shape'

Note this task is important for me, and I really want LLaVa to support text only also.
Thank you for your help!

Llava Hugging Face org

Hey @Tizzzzy !

Currently Llava models will not support text-only input. I have been changing lot of stuff lately with llava models and will bring back the text-only inference soon. It got removed accidentally but it shouldn't have been

Sign up or log in to comment