File size: 1,611 Bytes
5b6dfc5
 
5d65209
 
 
 
 
 
 
5b6dfc5
ba1387f
528b02e
a94f6a8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
---
license: llama2
datasets:
- ehartford/wizard_vicuna_70k_unfiltered
tags:
- uncensored
- wizard
- vicuna
- llama
---
This is an fp16 copy of [jarradh/llama2_70b_chat_uncensored](https://huggingface.co/jarradh/llama2_70b_chat_uncensored) for faster downloading and less disk space usage than the fp32 original. I simply imported the model to CPU with torch_dtype=torch.float16 and then exported it again. All credit for the model goes to [jarradh](https://huggingface.co/jarradh).

Arguable a better name for this model would be something like Llama-2-70B_Wizard-Vicuna-Uncensored-fp16, but to avoid confusion I'm sticking with jarradh's naming scheme.

<!-- repositories-available start -->
## Repositories available

* [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/llama2_70b_chat_uncensored-GPTQ)
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/llama2_70b_chat_uncensored-GGML)
* [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference, plus fp16 GGUF for requantizing](https://huggingface.co/TheBloke/YokaiKoibito/WizardLM-Uncensored-Falcon-40B-GGUF)
* [Jarrad Hope's unquantised model in fp16 pytorch format, for GPU inference and further conversions](https://huggingface.co/YokaiKoibito/llama2_70b_chat_uncensored-fp16)
* [Jarrad Hope's original unquantised fp32 model in pytorch format, for further conversions](https://huggingface.co/jarradh/llama2_70b_chat_uncensored)

<!-- repositories-available end -->

## Prompt template: Human-Response

```
### HUMAN:
{prompt}

### RESPONSE:
```