Alibaba-NLP
/

gte-multilingual-base

tomaarsen HF staff commited on Sep 17

Commit

7fc0678

•

1 Parent(s): f7d567e

Fix broken SentenceTransformer snippet; format code with Python format (#11)

- Fix broken SentenceTransformer snippet; format code with Python format (4f98c8de229b79178923ab4b65fa661c1dbf7b9e)

Co-authored-by: Tom Aarsen <[email protected]>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -4660,7 +4660,7 @@ refer to [enable-unpadding-and-xformers](https://huggingface.co/Alibaba-NLP/new-
 ### Get Dense Embeddings with Transformers
-```
 # Requires transformers>=4.36.0
 import torch.nn.functional as F
@@ -4693,12 +4693,10 @@ print(scores.tolist())
 ```
 ### Use with sentence-transformers
-```
 # Requires sentences-transformers>=3.0.0
 from sentence_transformers import SentenceTransformer
-from sentence_transformers.util import cos_sim
-import numpy as np
 input_texts = [
     "what is the capital of China?",
@@ -4708,24 +4706,18 @@ input_texts = [
 ]
 model_name_or_path="Alibaba-NLP/gte-multilingual-base"
-model = SentenceTransformer(', trust_remote_code=True)
-embeddings = model.encode(input_texts) # embeddings.shape (4, 768)
-# normalized embeddings
-norms = np.linalg.norm(embeddings, ord=2, axis=1, keepdims=True)
-norms[norms == 0] = 1
-embeddings = embeddings / norms
 # sim scores
-scores = (embeddings[:1] @ embeddings[1:].T)
 print(scores.tolist())
 # [[0.301699697971344, 0.7503870129585266, 0.32030850648880005]]
 ```
 ### Use with custom code to get dense embeddigns and sparse token weights
-```
 # You can find the script gte_embedding.py in https://huggingface.co/Alibaba-NLP/gte-multilingual-base/blob/main/scripts/gte_embedding.py
 from gte_embedding import GTEEmbeddidng

 ### Get Dense Embeddings with Transformers
+```python
 # Requires transformers>=4.36.0
 import torch.nn.functional as F
 ```
 ### Use with sentence-transformers
+```python
 # Requires sentences-transformers>=3.0.0
 from sentence_transformers import SentenceTransformer
 input_texts = [
     "what is the capital of China?",
 ]
 model_name_or_path="Alibaba-NLP/gte-multilingual-base"
+model = SentenceTransformer(model_name_or_path, trust_remote_code=True)
+embeddings = model.encode(input_texts, normalize_embeddings=True) # embeddings.shape (4, 768)
 # sim scores
+scores = model.similarity(embeddings[:1], embeddings[1:])
 print(scores.tolist())
 # [[0.301699697971344, 0.7503870129585266, 0.32030850648880005]]
 ```
 ### Use with custom code to get dense embeddigns and sparse token weights
+```python
 # You can find the script gte_embedding.py in https://huggingface.co/Alibaba-NLP/gte-multilingual-base/blob/main/scripts/gte_embedding.py
 from gte_embedding import GTEEmbeddidng