bokesyo commited on
Commit
817d582
1 Parent(s): 80c6598

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -3
README.md CHANGED
@@ -3,7 +3,7 @@ language:
3
  - en
4
  ---
5
 
6
- # MiniCPM-Visual-Embedding: An OCR-free Visual Document Embedding Model Based on MiniCPM-V-2.0
7
 
8
  With MiniCPM-Visual-Embedding, it is possible to directly build knowledge base with raw PDF/Book/Document without any OCR technique nor OCR pipeline. The model only takes images as document-side inputs and produce vectors representing document pages.
9
 
@@ -14,7 +14,67 @@ With MiniCPM-Visual-Embedding, it is possible to directly build knowledge base w
14
 
15
  # News
16
 
17
- - 2024-06-27: We released our first visual embedding model on huggingface.
18
 
19
- - 2024-05-08: We released our training code (full-parameter tuning with GradCache and DeepSpeed, supports large batch size across multiple GPUs with zero-stage1) and eval code.
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  - en
4
  ---
5
 
6
+ # MiniCPM-Visual-Embedding: An OCR-free Visual-Based Document Embedding Model Based on MiniCPM-V-2.0 as Your Personal Librarian
7
 
8
  With MiniCPM-Visual-Embedding, it is possible to directly build knowledge base with raw PDF/Book/Document without any OCR technique nor OCR pipeline. The model only takes images as document-side inputs and produce vectors representing document pages.
9
 
 
14
 
15
  # News
16
 
17
+ - 2024-06-27: We released our first visual embedding model minicpm-visual-embedding-v0.1 on [huggingface](https://huggingface.co/RhapsodyAI/minicpm-visual-embedding-v0.1).
18
 
19
+ - 2024-05-08: We [committed](https://github.com/bokesyo/minicpm-visual-embedding) our training code (full-parameter tuning with GradCache and DeepSpeed, supports large batch size across multiple GPUs with zero-stage1) and eval code.
20
 
21
+ # Get started
22
+
23
+ First you are suggested to git clone this huggingface repo or download repo with `huggingface_cli`.
24
+
25
+ ```bash
26
+ git lfs install
27
+ git clone https://huggingface.co/RhapsodyAI/minicpm-visual-embedding-v0.1
28
+ ```
29
+
30
+ or
31
+
32
+ ```bash
33
+ huggingface-cli download RhapsodyAI/minicpm-visual-embedding-v0.1
34
+ ```
35
+
36
+ ```python
37
+ from transformers import AutoModel
38
+ from transformers import AutoTokenizer
39
+ from PIL import Image
40
+ import torch
41
+
42
+ device = 'cuda:0'
43
+
44
+ def last_token_pool(last_hidden_states: Tensor,
45
+ attention_mask: Tensor) -> Tensor:
46
+ left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
47
+ if left_padding:
48
+ return last_hidden_states[:, -1]
49
+ else:
50
+ sequence_lengths = attention_mask.sum(dim=1) - 1
51
+ batch_size = last_hidden_states.shape[0]
52
+ return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
53
+
54
+
55
+ tokenizer = AutoTokenizer.from_pretrained('/local/path/to/minicpm-visual-embedding-v0.1')
56
+ model = AutoModel.from_pretrained('/local/path/to/minicpm-visual-embedding-v0.1')
57
+
58
+ image_1 = Image.open('/local/path/to/document1.png').convert('RGB')
59
+ image_2 = Image.open('/local/path/to/document2.png').convert('RGB')
60
+
61
+ query_instruction = 'Represent this query for retrieving relavant document: '
62
+
63
+ query = 'Who was elected as president of United States in 2020?'
64
+
65
+ query_full = query_instruction + query
66
+
67
+ # Embed text queries
68
+ q_outputs = model(text=[query_full], image=[None, None], tokenizer=tokenizer) # [B, s, d]
69
+ q_reps = last_token_pool(q_outputs.last_hidden_state, q_outputs.attention_mask) # [B, d]
70
+
71
+ # Embed image documents
72
+ p_outputs = model(text=['', ''], image=[image_1, image_2], tokenizer=tokenizer) # [B, s, d]
73
+ p_reps = last_token_pool(p_outputs.last_hidden_state, p_outputs.attention_mask) # [B, d]
74
+
75
+ # Calculate similarities
76
+ scores = torch.matmul(q_reps, p_reps)
77
+
78
+ print(scores)
79
+
80
+ ```