Repo includes all necessary files for staging an Inference Endpoints API with DeepSparse as discussed in this BLOG. This DistilBERT was sparsified using the SparseML library.

Sparse Transfer 80% VNNI Pruned DistilBERT

This model is the result of pruning the DistilBERT model to 80% using the VNNI blocking (semi-structured), followed by fine-tuning and quantization on the SST2 dataset. Pruning is performed with the GMP algorithm and using the masked language modeling task based on the BookCorpus and Wikipedia datasets. It achieves 90.5% accuracy on the validation dataset, recovering over 99% of the accuracy of the baseline model. See the included recipe for training instructions.

neuralmagic
/

sst2-distilbert-sparse-blog

Sparse Transfer 80% VNNI Pruned DistilBERT

Dataset used to train neuralmagic/sst2-distilbert-sparse-blog