Steps to reproducethe model in a legal dataset

by wilfoderek - opened Dec 7, 2023

Dec 7, 2023

Amazing work my friend!
Could you please share the necessary steps or providing any documentation that would enable us to replicate the experiment in a legal domain?
Thanks in advance my friends.

wilfoderek changed discussion title from Steps to reproduce in a legal dataset to Steps to reproducethe model in a legal dataset Dec 7, 2023

SeanLee97

WhereIsAI org Dec 8, 2023

Many thanks for following our work.
Our secret is to use angle optimization. You can build upon our model and fine-tune your data using angle optimization.
We have provided a friendly training interface; prepare your data and train your model with a few lines of code. Refer to https://github.com/SeanLee97/AnglE#2-custom-train.

SeanLee97 changed discussion status to closed Dec 14, 2023

wilfoderek

Dec 14, 2023

I would like to test it in a spanish language?
How can I achieve this? Any suggestion is welcomed.

SeanLee97

WhereIsAI org Dec 15, 2023

Unfortunately, UAE was only finetuned on English datasets.

For Spanish, I know there is a semantic textual similarity dataset SemEval-2015 Task 2. Maybe you can train on it using AnglE and evaluate its performance. xlm-roberta-large is a good choice to be used as the backbone model.

SeanLee97

WhereIsAI org Dec 15, 2023

Unfortunately, UAE was only finetuned on English datasets.

For Spanish, I know there is a semantic textual similarity dataset SemEval-2015 Task 2. Maybe you can train on it using AnglE and evaluate its performance. xlm-roberta-large is a good choice to be used as the backbone model.

To guarantee the generalization ability, you should collect more training data.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment