gokaygokay
/

moondream-prompt

Image-Text-to-Text

text-generation

Model card Files Files and versions Community

moondream-prompt / README.md

gokaygokay's picture

Update README.md

f741f12 verified 8 months ago

|

1.57 kB

	---
	license: apache-2.0
	pipeline_tag: image-text-to-text
	---

	Fine tuned version of moondream2 for prompt generation from images. Moondream is a small vision language model designed to run efficiently on edge devices. Check out the [GitHub repository](https://github.com/vikhyat/moondream) for details, or try it out on the [Hugging Face Space](https://huggingface.co/spaces/vikhyatk/moondream2)!

	Usage

	```bash
	pip install transformers timm einops bitsandbytes accelerate
	```

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from PIL import Image

	DEVICE = "cuda"
	DTYPE = torch.float32 if DEVICE == "cpu" else torch.float16 # CPU doesn't support float16
	revision = "1082928d0aa39290a31a92f632ca670458eda512"
	tokenizer = AutoTokenizer.from_pretrained("gokaygokay/moondream-prompt", revision=revision)
	moondream = AutoModelForCausalLM.from_pretrained("gokaygokay/moondream-prompt",trust_remote_code=True,
	torch_dtype=DTYPE, device_map={"": DEVICE}, revision=revision)
	moondream.eval()

	image_path = "<image_path>"
	image = Image.open(image_path).convert("RGB")
	md_answer = moondream.answer_question(
	moondream.encode_image(image),
	"Describe this image and its style in a very detailed manner",
	tokenizer=tokenizer,
	)

	print(md_answer)
	```

	Example
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/630899601dd1e3075d975785/-x5jO3xnQrUz1uYO9SHji.png)

	"a very angry old man with white hair and a mustache, in the style of a Pixar movie, hyperrealistic, white background, 8k"