AmelieSchreiber
commited on
Commit
•
66fb9b9
1
Parent(s):
b73ac0e
Update README.md
Browse files
README.md
CHANGED
@@ -45,7 +45,7 @@ This model was trained on approximately 70,000 proteins with binding site and ac
|
|
45 |
The training split was a random 85/15 split for this version, and does not consider anything in the way of family or sequence
|
46 |
similarity. New iterations of the model have been trained on larger datasets (over 200,000 proteins), with the split such that
|
47 |
there are no overlapping families, however they seem to overfit much earlier and have significantly worse performance in terms
|
48 |
-
of the training metrics (precision, recall, and F1).
|
49 |
|
50 |
Training Metrics for the Model in the form of the `trainer_state.json` can be
|
51 |
[found here](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_general_binding_sites_v2/blob/main/trainer_state.json).
|
|
|
45 |
The training split was a random 85/15 split for this version, and does not consider anything in the way of family or sequence
|
46 |
similarity. New iterations of the model have been trained on larger datasets (over 200,000 proteins), with the split such that
|
47 |
there are no overlapping families, however they seem to overfit much earlier and have significantly worse performance in terms
|
48 |
+
of the training metrics (precision, recall, and F1). To address this we plan to implement LoRA (and hopefully QLoRA).
|
49 |
|
50 |
Training Metrics for the Model in the form of the `trainer_state.json` can be
|
51 |
[found here](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_general_binding_sites_v2/blob/main/trainer_state.json).
|