Repo includes all necessary files for staging an Inference Endpoints API with DeepSparse as discussed in this BLOG. This DistilBERT was sparsified using the SparseML library.
Sparse Transfer 80% VNNI Pruned DistilBERT
This model is the result of pruning the DistilBERT model to 80% using the VNNI blocking (semi-structured), followed by fine-tuning and quantization on the SST2 dataset. Pruning is performed with the GMP algorithm and using the masked language modeling task based on the BookCorpus and Wikipedia datasets. It achieves 90.5% accuracy on the validation dataset, recovering over 99% of the accuracy of the baseline model. See the included recipe for training instructions.
- Downloads last month
- 8
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.