--- license: apache-2.0 pipeline_tag: image-text-to-text --- Fine tuned version of moondream2 for prompt generation from images. Moondream is a small vision language model designed to run efficiently on edge devices. Check out the [GitHub repository](https://github.com/vikhyat/moondream) for details, or try it out on the [Hugging Face Space](https://huggingface.co/spaces/vikhyatk/moondream2)! **Usage** ```bash pip install transformers timm einops ``` ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM from PIL import Image DEVICE = "cuda" DTYPE = torch.float32 if DEVICE == "cpu" else torch.float16 # CPU doesn't support float16 tokenizer = AutoTokenizer.from_pretrained("gokaygokay/moondream-prompt") moondream = AutoModelForCausalLM.from_pretrained("gokaygokay/moondream-prompt",trust_remote_code=True, torch_dtype=DTYPE, device_map={"": DEVICE}) moondream.eval() image_path = "" image = Image.open(image_path).convert("RGB") md_answer = moondream.answer_question( moondream.encode_image(image), "Describe this image and its style in a very detailed manner", tokenizer=tokenizer, ) print(md_answer) ``` **Example** ![image/png](https://cdn-uploads.huggingface.co/production/uploads/630899601dd1e3075d975785/-x5jO3xnQrUz1uYO9SHji.png) Moondream answer: "a very angry old man with white hair and a mustache, in the style of a Pixar movie, hyperrealistic, white background, 8k"