bidiptas commited on
Commit
1c7ae7d
1 Parent(s): 94f5f0e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -9,7 +9,7 @@ pipeline_tag: image-to-text
9
 
10
  # PG-InstructBLIP model
11
 
12
- Finetuned version of InstructBLIP with Flan-T5-xxl as the language model. PG-InstructBLIP was introduced in the paper [Physically Grounded Vision-Language Models for Robotic Manipulation](https://iliad.stanford.edu/pg-vlm/) by Gao et al.
13
 
14
  ## Model description
15
 
@@ -20,6 +20,8 @@ PG-InstructBLIP is finetuned using the [PhysObjects dataset](https://drive.googl
20
 
21
  This model is designed to be used with the LAVIS library. Please install [salesforce-lavis](https://pypi.org/project/salesforce-lavis/) and download this model through git-lfs or direct downloading.
22
 
 
 
23
  ```
24
  import torch
25
  from PIL import Image
@@ -41,6 +43,8 @@ vlm = load_model(
41
  device="cuda" if torch.cuda.is_available() else "cpu"
42
  )
43
 
 
 
44
  model_cls = registry.get_model_class('blip2_t5_instruct')
45
  model_type = 'flant5xxl'
46
  preprocess_cfg = OmegaConf.load(model_cls.default_config_path(model_type)).preprocess
 
9
 
10
  # PG-InstructBLIP model
11
 
12
+ Finetuned version of InstructBLIP with Flan-T5-XXL as the language model. PG-InstructBLIP was introduced in the paper [Physically Grounded Vision-Language Models for Robotic Manipulation](https://iliad.stanford.edu/pg-vlm/) by Gao et al.
13
 
14
  ## Model description
15
 
 
20
 
21
  This model is designed to be used with the LAVIS library. Please install [salesforce-lavis](https://pypi.org/project/salesforce-lavis/) and download this model through git-lfs or direct downloading.
22
 
23
+ After loading the model, you can disable the qformer text input to follow the same configuration we used for fine-tuning. However, the model still works well with it enabled, so we recommend users to experiment with both and choose the optimal configuration on a case-by-case basis.
24
+
25
  ```
26
  import torch
27
  from PIL import Image
 
43
  device="cuda" if torch.cuda.is_available() else "cpu"
44
  )
45
 
46
+ vlm.qformer_text_input = False # Optionally disable qformer text
47
+
48
  model_cls = registry.get_model_class('blip2_t5_instruct')
49
  model_type = 'flant5xxl'
50
  preprocess_cfg = OmegaConf.load(model_cls.default_config_path(model_type)).preprocess