README.md · alokabhishek/Mistral-7B-Instruct-v0.2-5.0-bpw-exl2 at main

File size: 4,079 Bytes

---
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- ExLlamaV2
- 5bit
- Mistral
- Mistral-7B
- quantized
- exl2
- 5.0-bpw
---

# Model Card for alokabhishek/Mistral-7B-Instruct-v0.2-5.0-bpw-exl2

<!-- Provide a quick summary of what the model is/does. -->
This repo contains 5-bit quantized (using ExLlamaV2) model Mistral AI_'s Mistral-7B-Instruct-v0.2



## Model Details

- Model creator: [Mistral AI_](https://huggingface.co/mistralai)
- Original model: [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)


### About quantization using ExLlamaV2


- ExLlamaV2 github repo: [ExLlamaV2 github repo](https://github.com/turboderp/exllamav2)



# How to Get Started with the Model

Use the code below to get started with the model.


## How to run from Python code

#### First install the package
```shell
# Install ExLLamaV2
!git clone https://github.com/turboderp/exllamav2
!pip install -e exllamav2
```

#### Import 

```python
from huggingface_hub import login, HfApi, create_repo
from torch import bfloat16
import locale
import torch
import os
```

#### set up variables

```python
# Define the model ID for the desired model
model_id = "alokabhishek/Mistral-7B-Instruct-v0.2-5.0-bpw-exl2"
BPW = 5.0

# define variables
model_name =  model_id.split("/")[-1]

```

#### Download the quantized model
```shell
!git-lfs install
# download the model to loacl directory
!git clone https://{username}:{HF_TOKEN}@huggingface.co/{model_id} {model_name}
```

#### Run Inference on quantized model using 
```shell
# Run model
!python exllamav2/test_inference.py -m {model_name}/ -p "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."
```


```python
import sys, os

sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

from exllamav2 import (
    ExLlamaV2,
    ExLlamaV2Config,
    ExLlamaV2Cache,
    ExLlamaV2Tokenizer,
)

from exllamav2.generator import ExLlamaV2BaseGenerator, ExLlamaV2Sampler

import time

# Initialize model and cache

model_directory = "/model_path/Mistral-7B-Instruct-v0.2-5.0-bpw-exl2/"
print("Loading model: " + model_directory)

config = ExLlamaV2Config(model_directory)
model = ExLlamaV2(config)
cache = ExLlamaV2Cache(model, lazy=True)
model.load_autosplit(cache)
tokenizer = ExLlamaV2Tokenizer(config)

# Initialize generator

generator = ExLlamaV2BaseGenerator(model, cache, tokenizer)

# Generate some text

settings = ExLlamaV2Sampler.Settings()
settings.temperature = 0.85
settings.top_k = 50
settings.top_p = 0.8
settings.token_repetition_penalty = 1.01
settings.disallow_tokens(tokenizer, [tokenizer.eos_token_id])

prompt = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."

max_new_tokens = 512

generator.warmup()
time_begin = time.time()

output = generator.generate_simple(prompt, settings, max_new_tokens, seed=1234)

time_end = time.time()
time_total = time_end - time_begin

print(output)
print()
print(f"Response generated in {time_total:.2f} seconds")


```

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]


### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

[More Information Needed]

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]


## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->


#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

### Results

[More Information Needed]


## Model Card Authors [optional]

[More Information Needed]

## Model Card Contact

[More Information Needed]