Inference code?
#1
by
04RR
- opened
Hey, love the idea. I was trying to test the model with the code below -
from transformers import AutoProcessor, GroundingDinoForObjectDetection
import torch
from PIL import Image
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt
url = "test.png"
img = Image.open(url).convert("RGB")
draw = ImageDraw.Draw(img)
model_id = "rawhad/grounding-dino-base-screen-ai-v1"
text = "" # not sure what to put here, is it required?
image_processor = AutoProcessor.from_pretrained(model_id)
model = GroundingDinoForObjectDetection.from_pretrained(model_id)
inputs = image_processor(images=img, text=text, return_tensors="pt")
outputs = model(**inputs)
target_sizes = torch.tensor([img.size[::-1]])
results = image_processor.image_processor.post_process_object_detection(
outputs, threshold=0.85, target_sizes=target_sizes
)[0]
print(results)
Not sure what needs to go in for text (if it is needed at all). I have tried with an empty string and "ui components" and both didn't give me any results.
Thank you for your work!
Hey there @04RR
Actually I have been trying to train this model, but cannot get it to work right. So this is still a WIP. The code you wrote is correct. Few suggestions.
- Start with original GroundingDINO model.
- Text is the description of the object you want to detect in the image, e.g.: "a cat lying on the sofa"
- For the post processing use the
processor.post_process_grounded_object_detection
like below:
results = processor.post_process_grounded_object_detection(
outputs,
inputs.input_ids,
box_threshold=0.28,
text_threshold=0.0,
target_sizes=[image.size[::-1]]
)[0]
You can play with the box and text thresholds.
- Box threshold: Confidence threshold of the bounding box
- Text threshold: CLIP confidence threshold for the region in bounding box and text
For now this model is still a WIP. Will update when it's trained.