machinez's picture
Update README.md
a346954 verified
metadata
license: apache-2.0
tags:
  - trl
  - orpo
  - generated_from_trainer
  - exl2
base_model: mistral-community/Mixtral-8x22B-v0.1
datasets:
  - argilla/distilabel-capybara-dpo-7k-binarized
model-index:
  - name: zephyr-orpo-141b-A35b-v0.1
    results: []
Zephyr 141B Logo

machinez/zephyr-orpo-141b-A35b-v0.1-exl2

This model was converted to EXL2 format from HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1. Refer to the original model card for more details on the model.

Each branch contains an individual bits per weight, with the main one containing only the meaurement.json for further conversions.

1.5 bits per weight - Fits Dual RTX 3090/4090 or Triple Nvidia Tesla P100 16gb at 4k context

2.75 bits per weight - Fits Quad Nvidia Tesla P100 16gb at 16k context

Sample instructions to load in TabbyAPI @ 1.5bpw on 3x Nvidia Tesla P100 16gb at 4k context. ~14 tok/s

{
  "name": "machinez_zephyr-orpo-141b-A35b-v0.1_1.5bpw",
  "max_seq_len": 4096,
  "override_base_seq_len": 4096,
  "gpu_split_auto": false,
  "autosplit_reserve": [
    96
  ],
  "gpu_split": [
    14.15,
    14,
    15
  ],
  "rope_scale": 1,
  "rope_alpha": 1,
  "no_flash_attention": false,
  "cache_mode": "fp16",
  "prompt_template": "string",
  "num_experts_per_token": 0,
  "use_cfg": true,
  "fasttensors": false,
  "skip_queue": false
}

Sample instructions to load in TabbyAPI @ 2.75bpw on 4x Nvidia Tesla P100 16gb at 16k context. ~5.6 tok/s

{
  "name": "machinez_zephyr-orpo-141b-A35b-v0.1_2.75bpw",
  "max_seq_len": 16384,
  "override_base_seq_len": 16384,
  "gpu_split_auto": false,
  "autosplit_reserve": [
    96
  ],
  "gpu_split": [
    12.5,
    13,
    13,
    16.1
  ],
  "rope_scale": 1,
  "rope_alpha": 1,
  "no_flash_attention": false,
  "cache_mode": "fp16",
  "prompt_template": "string",
  "num_experts_per_token": 0,
  "use_cfg": true,
  "fasttensors": false,
  "skip_queue": false
}

Download instructions

With git:

git clone --single-branch --branch 2_75 https://huggingface.co/machinez/zephyr-orpo-141b-A35b-v0.1-exl2

With huggingface hub (credit to TheBloke for instructions, borrowed from bartowski):

pip3 install -U "huggingface_hub[cli]"

(optional)

git config --global credential.helper 'store --file ~/.my-credentials'
huggingface-cli login

To download the main (only useful if you only care about measurement.json) branch to a folder called machinez_zephyr-orpo-141b-A35b-v0.1-exl2:

mkdir machinez_zephyr-orpo-141b-A35b-v0.1-exl2_2.75bpw
huggingface-cli download machinez/zephyr-orpo-141b-A35b-v0.1-exl2 --local-dir machinez_zephyr-orpo-141b-A35b-v0.1-exl2 --local-dir-use-symlinks False

To download from a different branch, add the --revision parameter:

mkdir machinez_zephyr-orpo-141b-A35b-v0.1-exl2_2.75bpw
huggingface-cli download machinez/zephyr-orpo-141b-A35b-v0.1-exl2 --revision 2_75 --local-dir machinez_zephyr-orpo-141b-A35b-v0.1-exl2_2.75bpw --local-dir-use-symlinks False