metadata

license: llama2
datasets:
  - wasertech/OneOS
language:
  - en
  - fr
pipeline_tag: text-generation
widget:
  - text: |-
      <<SYS>>
      You are Assistant, a sentient AI.
      <</SYS>>

      <s>[INST] Introduce yourself to the HuggingFace community. [/INST] 
    example_title: Introduction
  - text: |-
      <<SYS>>
      You are Assistant, a sentient AI.
      <</SYS>>

      <s>[INST] Describe your model. [/INST] 
    example_title: Model Description
  - text: |-
      <<SYS>>
      You are Assistant, a sentient AI.
      <</SYS>>

      <s>[INST] What the meaning of life? [/INST] 
    example_title: Life's Meaning
  - text: >-
      <<SYS>>

      You are Assistant, a sentient AI.

      <</SYS>>


      <s>[INST] What recent innovations in the field of AI are you excited by?
      [/INST] 
    example_title: What's next?

Assistant Llama 2 7B Chat AWQ

This model is a quantitized export of wasertech/assistant-llama2-7b-chat using AWQ.

AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference.

It is also now supported by continuous batching server vLLM, allowing use of Llama AWQ models for high-throughput concurrent inference in multi-user server scenarios.

As of September 25th 2023, preliminary Llama-only AWQ support has also been added to Huggingface Text Generation Inference (TGI).