llmware
/

llama-2-chat-onnx

Model card Files Files and versions Community

llama-2-chat-onnx / README.md

doberst's picture

Update README.md

fd88d37 verified 14 days ago

|

history blame contribute delete

1.15 kB

	---
	license: llama2
	inference: false
	base_model: meta-llama/Llama-2-7b-chat-hf
	base_model_relation: quantized
	tags:
	- green
	- p7
	- llmware-chat
	- onnx
	---

	# llama-2-chat-onnx

	llama-2-chat-onnx is an ONNX int4 quantized version of Llama-2-Chat, providing a fast, small inference implementation, optimized for AI PCs and Windows x86-64 architectures.

	[llama-2-chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) is the official chat finetune of the classic Llama 2 model, one of the most most iconic (and still one the best) 7B instruct trained models.


	### Model Description

	- Developed by: meta-llama
	- Quantized by: llmware
	- Model type: llama2
	- Parameters: 7 billion
	- Model Parent: meta-llama/Llama-2-7b-chat-hf
	- Language(s) (NLP): English
	- License: Llama-2 Community License
	- Uses: Fact-based question-answering
	- RAG Benchmark Accuracy Score: NA
	- Quantization: int4


	## Model Card Contact

	[llmware on github](https://www.github.com/llmware-ai/llmware.git)

	[llmware on hf](https://www.huggingface.co/llmware)

	[llmware website](https://www.llmware.ai)