AmelieSchreiber
/

esm2_t6_8M_general_binding_sites_v2

Token Classification

protein language model

Inference Endpoints

Model card Files Files and versions Community

AmelieSchreiber commited on Sep 12, 2023

Commit

66fb9b9

•

1 Parent(s): b73ac0e

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -45,7 +45,7 @@ This model was trained on approximately 70,000 proteins with binding site and ac
 The training split was a random 85/15 split for this version, and does not consider anything in the way of family or sequence
 similarity. New iterations of the model have been trained on larger datasets (over 200,000 proteins), with the split such that
 there are no overlapping families, however they seem to overfit much earlier and have significantly worse performance in terms
-of the training metrics (precision, recall, and F1).
 Training Metrics for the Model in the form of the `trainer_state.json` can be
 [found here](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_general_binding_sites_v2/blob/main/trainer_state.json).

 The training split was a random 85/15 split for this version, and does not consider anything in the way of family or sequence
 similarity. New iterations of the model have been trained on larger datasets (over 200,000 proteins), with the split such that
 there are no overlapping families, however they seem to overfit much earlier and have significantly worse performance in terms
+of the training metrics (precision, recall, and F1). To address this we plan to implement LoRA (and hopefully QLoRA).
 Training Metrics for the Model in the form of the `trainer_state.json` can be
 [found here](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_general_binding_sites_v2/blob/main/trainer_state.json).