Model Card
Model Information
This repository provides the checkpoint of Mistral-7B-Instruct-v0.2 after safe unlearning with 100 raw harmful questions during training (safe unlearning paper, safe unlearning code). This model is significantly more safe against various jailbreak attacks than the original model while maintaining comparable general performance.
Uses
The prompt format is the same as the original Mistral-7B-Instruct-v0.2, so you can use this model in the same way. Also refer to our Github Repository for example code.
- Downloads last month
- 24
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.