Sentence Transformers
This task lets you easily train or fine-tune a Sentence Transformer model on your own dataset.
AutoTrain supports the following types of sentence transformer finetuning:
pair
: dataset with two sentences: anchor and positivepair_class
: dataset with two sentences: premise and hypothesis and a target labelpair_score
: dataset with two sentences: sentence1 and sentence2 and a target scoretriplet
: dataset with three sentences: anchor, positive and negativeqa
: dataset with two sentences: query and answer
Data Format
Sentence Transformers finetuning accepts data in CSV/JSONL format. You can also use a dataset from Hugging Face Hub.
pair
For pair
training, the data should be in the following format:
anchor | positive |
---|---|
hello | hi |
how are you | I am fine |
What is your name? | My name is Abhishek |
Which is the best programming language? | Python |
pair_class
For pair_class
training, the data should be in the following format:
premise | hypothesis | label |
---|---|---|
hello | hi | 1 |
how are you | I am fine | 0 |
What is your name? | My name is Abhishek | 1 |
Which is the best programming language? | Python | 1 |
pair_score
For pair_score
training, the data should be in the following format:
sentence1 | sentence2 | score |
---|---|---|
hello | hi | 0.8 |
how are you | I am fine | 0.2 |
What is your name? | My name is Abhishek | 0.9 |
Which is the best programming language? | Python | 0.7 |
triplet
For triplet
training, the data should be in the following format:
anchor | positive | negative |
---|---|---|
hello | hi | bye |
how are you | I am fine | I am not fine |
What is your name? | My name is Abhishek | Whats it to you? |
Which is the best programming language? | Python | Javascript |
qa
For qa
training, the data should be in the following format:
query | answer |
---|---|
hello | hi |
how are you | I am fine |
What is your name? | My name is Abhishek |
Which is the best programming language? | Python |