Post
2556
Google DeepMind introduces Gecko a new text embedding! Gecko uses a two-step process that leverages synthetic data generation and reranking.
Keypoints:
* Uses an LLM to generate diverse synthetic queries and tasks from web passages
* Refines the data by retrieving candidate passages and relabeling positives/negatives using the same LLM
* Achieves very good results on the Massive Text Embedding Benchmark, where compact 256D Gecko outperforms 768D models.
* 768D Gecko achieves state-of-the-art performance competing with models a lot larger larger.
Paper: Gecko: Versatile Text Embeddings Distilled from Large Language Models (2403.20327)
More details in my blog: https://huggingface.co/blog/vladbogo/gecko
Congrats to the authors for their work!
Keypoints:
* Uses an LLM to generate diverse synthetic queries and tasks from web passages
* Refines the data by retrieving candidate passages and relabeling positives/negatives using the same LLM
* Achieves very good results on the Massive Text Embedding Benchmark, where compact 256D Gecko outperforms 768D models.
* 768D Gecko achieves state-of-the-art performance competing with models a lot larger larger.
Paper: Gecko: Versatile Text Embeddings Distilled from Large Language Models (2403.20327)
More details in my blog: https://huggingface.co/blog/vladbogo/gecko
Congrats to the authors for their work!