monsoon-nlp/protein-matryoshka-embeddings
Sentence Similarity
•
Updated
•
29
•
5
Embeddings and NLG related to biology / amino acid sequences
Note Faster embeddings for proteins from UniProt: see blog post/explainer https://huggingface.co/blog/monsoon-nlp/proteins-matryoshka-embeddings
Note Retinopathy classifier
Note TinyLLaMA-1.1B continued pretraining on 50% quinoa proteins + 50% simulated science textbooks
Note tinyllama-mixpretrain-quinoa-sciphi LoRA, using finetuning split minus maize/corn/Zea
Note TBD: long context LLaMA 3 LoRA finetuning on 16 MM nucleotides of the Kañiwa genome