Finetuning script using HuggingFace (No llama-factory)

#32

by 2U1 - opened Sep 11

Discussion

2U1

Sep 11

•

edited Sep 11

https://github.com/2U1/Qwen2-VL-Finetune

I made a code for who wants to use the huggingface version to finetune, and having difficult using some other frameworks like me.

This code only uses huggingface for fine-tuning the 7B and 2B model.

Also, you can set different learning_rate for vision_model and language_model. ( Also for the merger)

Feedback and issues are welcome!

tanliboy

Sep 11

Thanks for sharing it! Any video demo with this fine-tuning codebase?

2U1

Sep 11

@tanliboy I'm working on with fine-tuning with video. It will soon be updated!

2U1

Sep 11

@tanliboy I've updated the code for video training! Do you need a inference demo with video via cli or gradio?

Anu0202

Sep 14

@2U1 thanks for the scripts for LORA tuning the model.

I was trying to finetune it on a small dataset ~2000 samples (single image single turn QA)

I was trying to do it on Kaggle with 29GB RAM and 2 * T4 GPUs with 15GB each...but I am always getting into CUDA OOM (no offload, on params offloaded) and RAM OOM if param and optimizer both offloaded to CPU. Is there any way out? What is the suggested compute?

Also, I am using 2B param model for now. Can you throw some light on this? Thanks!

2U1

Sep 15

@Anu0202 Thanks for your interest!
It takes a lot of memory so you should use offloading and decrease the max pixel values.

tanliboy

Sep 16

Thanks, @Anu0202 ! Will try it out .

lucreziaT

10 days ago

Hello, thank you for sharing the code! I followed all the instructions, so I have the environment with all the packages installed, and the train dataset in the right format.
When i launch the fine-tuning with : bash scripts/finetune_lora_vision.sh --data_path my.json --image_folder myfolder --model_id '/anaconda3/envs/qwen2/lib/python3.10/site-packages/transformers/models/qwen2_vl/'
I have many errors that are related to the flash_attn package: 'ImportError: /anaconda3/envs/qwen2/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c105Error4whatEv'
Do you have any clue about what the problem could be? My version of flash_attn is 2.5.8, of Python is 3.10.14 , CUDA is 12.6.77 and I am working on Ubuntu 20.04.6

2U1

9 days ago

@lucreziaT If so, you can downgrade the torch to torch==2.3.0.
I'll try some other combination with this again.

lucreziaT

6 days ago

Hello, in the end, I had to downgrade CUDA to version 12.1 .
I now have a new issue:
RuntimeError: shape mismatch: value tensor of shape [256, 3584] cannot be broadcast to indexing result of shape [0, 3584]
I see from here: https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/discussions/33 that I should add a processor.apply_chat_template, but I don't know where. Do you have any clue?

2U1

5 days ago

@lucreziaT Does your data looks like

[
  {
    "id": "000000033471",
    "image": "000000033471.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWhat are the colors of the bus in the image?"
      },
      {
        "from": "gpt",
        "value": "The bus in the image is white and red."
      }
    ]
  }
  ...
]

When you are using my code, You should have <image>\n in the text.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment