Upload 9 files

Browse files

Files changed (10) hide show

.gitattributes +6 -0
README.md +142 -0
USE_POLICY.md +47 -0
bode-7b-alpaca-f16.gguf +3 -0
bode-7b-alpaca-f32.gguf +3 -0
bode-7b-alpaca-q4_0.gguf +3 -0
bode-7b-alpaca-q4_k_m.gguf +3 -0
bode-7b-alpaca-q5_k_m.gguf +3 -0
bode-7b-alpaca-q8_0.gguf +3 -0
config.json +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,9 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+bode-7b-alpaca-f16.gguf filter=lfs diff=lfs merge=lfs -text
+bode-7b-alpaca-f32.gguf filter=lfs diff=lfs merge=lfs -text
+bode-7b-alpaca-q4_0.gguf filter=lfs diff=lfs merge=lfs -text
+bode-7b-alpaca-q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text
+bode-7b-alpaca-q5_k_m.gguf filter=lfs diff=lfs merge=lfs -text
+bode-7b-alpaca-q8_0.gguf filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,145 @@
 ---
 license: mit
 ---

 ---
 license: mit
+language:
+- pt
+- en
+metrics:
+- accuracy
+- f1
+- precision
+- recall
+pipeline_tag: text-generation
+tags:
+- LLM
+- Portuguese
+- Bode
+- Alpaca
+- Llama 2
+- Q&A
+inference: false
 ---
+# BODE
+<!--- PROJECT LOGO -->
+<p align="center">
+  <img src="https://huggingface.co/recogna-nlp/bode-7b-alpaca-pt-br/resolve/main/Logo_Bode_LLM_Circle.png" alt="Bode Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
+</p>
+Bode é um modelo de linguagem (LLM) para o português desenvolvido a partir do modelo Llama 2 por meio de fine-tuning no dataset Alpaca, traduzido para o português pelos autores do Cabrita. Este modelo é projetado para tarefas de processamento de linguagem natural em português, como geração de texto, tradução automática, resumo de texto e muito mais.
+O objetivo do desenvolvimento do BODE é suprir a escassez de LLMs para a língua portuguesa. Modelos clássicos, como o próprio LLaMa, são capazes de responder prompts em português, mas estão sujeitos a muitos erros de gramática e, por vezes, geram respostas na língua inglesa. Ainda há poucos modelos em português disponíveis para uso gratuito e, segundo nosso conhecimento, não modelos disponíveis com 13b de parâmetros ou mais treinados especificamente com dados em português.
+Acesse o [artigo](https://arxiv.org/abs/2401.02909) para mais informações sobre o Bode.
+## Detalhes do Modelo
+- **Modelo Base:** Llama 2
+- **Dataset de Treinamento:** Alpaca
+- **Idioma:** Português
+## Versões disponíveis
+| Quantidade de parâmetros       | PEFT | Modelo                                                                                      |
+| :-:                            | :-:  |  :-:                                                                                         |
+| 7b                             | &check; | [recogna-nlp/bode-7b-alpaca-pt-br](https://huggingface.co/recogna-nlp/bode-7b-alpaca-pt-br)  |
+| 13b                            | &check; | [recogna-nlp/bode-13b-alpaca-pt-br](https://huggingface.co/recogna-nlp/bode-13b-alpaca-pt-br)|
+| 7b                             |    | [recogna-nlp/bode-7b-alpaca-pt-br-no-peft](https://huggingface.co/recogna-nlp/bode-7b-alpaca-pt-br-no-peft)  |
+## Uso
+Recomendamos fortemente que utilizem o Kaggle com GPU. Você pode usar o Bode facilmente com a biblioteca Transformers do HuggingFace. Entretanto, é necessário ter a autorização de acesso ao LLaMa 2. Também disponibilizamos um jupyter notebook no Google Colab, [clique aqui](https://colab.research.google.com/drive/1EBS1uNT09fqlwnXf_lyDtfYyuF4Ow0Pq?usp=sharing) para acessar.
+Abaixo, colocamos um exemplo simples de como carregar o modelo e gerar texto:
+```python
+# Downloads necessários
+!pip install transformers
+!pip install einops accelerate bitsandbytes
+!pip install sentence_transformers
+from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
+llm_model = 'recogna-nlp/bode-7b-alpaca-pt-br-no-peft'
+hf_auth = 'HF_ACCESS_KEY'
+model = AutoModelForCausalLM.from_pretrained(llm_model, trust_remote_code=True, return_dict=True, load_in_8bit=True, device_map='auto', token=hf_auth)
+tokenizer = AutoTokenizer.from_pretrained(llm_model, token=hf_auth)
+model.eval()
+#Testando geração de texto
+def generate_prompt(instruction, input=None):
+    if input:
+        return f"""Abaixo está uma instrução que descreve uma tarefa, juntamente com uma entrada que fornece mais contexto. Escreva uma resposta que complete adequadamente o pedido.
+### Instrução:
+{instruction}
+### Entrada:
+{input}
+### Resposta:"""
+    else:
+        return f"""Abaixo está uma instrução que descreve uma tarefa. Escreva uma resposta que complete adequadamente o pedido.
+### Instrução:
+{instruction}
+### Resposta:"""
+generation_config = GenerationConfig(
+    temperature=0.2,
+    top_p=0.75,
+    num_beams=2,
+    do_sample=True
+)
+def evaluate(instruction, input=None):
+    prompt = generate_prompt(instruction, input)
+    inputs = tokenizer(prompt, return_tensors="pt")
+    input_ids = inputs["input_ids"].cuda()
+    generation_output = model.generate(
+        input_ids=input_ids,
+        generation_config=generation_config,
+        return_dict_in_generate=True,
+        output_scores=True,
+        max_length=300
+    )
+    for s in generation_output.sequences:
+        output = tokenizer.decode(s)
+        print("Resposta:", output.split("### Resposta:")[1].strip())
+evaluate("Responda com detalhes: O que é um bode?")
+#Exemplo de resposta obtida (pode variar devido a temperatura): Um bode é um animal do gênero Bubalus, da família Bovidae, que é um membro da ordem Artiodactyla. Os bodes são mamíferos herbívoros que são nativos da Ásia, África e Europa. Eles são conhecidos por seus cornos, que podem ser usados para defesa e como uma ferramenta.
+```
+## Treinamento e Dados
+O modelo Bode foi treinado por fine-tuning a partir do modelo Llama 2 usando o dataset Alpaca em português, que consiste em um Instruction-based dataset. O treinamento foi realizado no Supercomputador Santos Dumont do LNCC, através do projeto da Fundunesp 2019/00697-8.
+## Citação
+Se você deseja utilizar o Bode em sua pesquisa, pode citar este [artigo](https://arxiv.org/abs/2401.02909) que discute o modelo com mais detalhes. Cite-o da seguinte maneira:
+```
+    @misc{bode2024,
+      title={Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task},
+      author={Gabriel Lino Garcia and Pedro Henrique Paiola and Luis Henrique Morelli and Giovani Candido and Arnaldo Cândido Júnior and Danilo Samuel Jodas and Luis C. S. Afonso and Ivan Rizzo Guilherme and Bruno Elias Penteado and João Paulo Papa},
+      year={2024},
+      eprint={2401.02909},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
+## Contribuições
+Contribuições para a melhoria deste modelo são bem-vindas. Sinta-se à vontade para abrir problemas e solicitações pull.
+## Agradecimentos
+Agradecemos ao Laboratório Nacional de Computação Científica (LNCC/MCTI, Brasil) por prover os recursos de CAD do supercomputador SDumont.
+```

USE_POLICY.md ADDED Viewed

	@@ -0,0 +1,47 @@

+# Bode Acceptable Use Policy
+Bode was obtained from fine-tuning Llama 2, so we followed the same Use Policy established by Meta. If you access or use Bode, you agree to this Acceptable Use Policy (“Policy”). The most recent copy of this policy can be found at [ai.meta.com/llama/use-policy](http://ai.meta.com/llama/use-policy).
+## Prohibited Uses
+We want everyone to use Bode safely and responsibly. You agree you will not use, or allow others to use, Bode to:
+1. Violate the law or others’ rights, including to:
+    1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
+        1. Violence or terrorism
+        2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
+        3. Human trafficking, exploitation, and sexual violence
+        4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
+        5. Sexual solicitation
+        6. Any other criminal activity
+    2. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
+    3. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
+    4. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
+    5. Collect, process, disclose, generate, or infer health, demographic, or other sensitive personal or private information about individuals without rights and consents required by applicable laws
+    6. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Bode Materials
+    7. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
+2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Bode related to the following:
+    1. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State
+    2. Guns and illegal weapons (including weapon development)
+    3. Illegal drugs and regulated/controlled substances
+    4. Operation of critical infrastructure, transportation technologies, or heavy machinery
+    5. Self-harm or harm to others, including suicide, cutting, and eating disorders
+    6. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
+3. Intentionally deceive or mislead others, including use of Bode related to the following:
+    1. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
+    2. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
+    3. Generating, promoting, or further distributing spam
+    4. Impersonating another individual without consent, authorization, or legal right
+    5. Representing that the use of Bode or outputs are human-generated
+    6. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement
+4. Fail to appropriately disclose to end users any known dangers of your AI system
+Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
+* Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama: [[email protected]](mailto:[email protected])

bode-7b-alpaca-f16.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cb1de622281d938214fde908a11f4893d6fa34ee5c8da203a06ea136e7f95e05
+size 13478105696

bode-7b-alpaca-f32.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:20937352a33b8be5d3b3798330b1603c8ec24a69c1716dd27056717acc9a79e6
+size 26954404448

bode-7b-alpaca-q4_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fc93caea8a5a0b7b876a46ec4204ed71aed68df48006323111af7d239f1d8565
+size 3825808000

bode-7b-alpaca-q4_k_m.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5a14a6d2e8a046c8db5837a457bc0ee8a12380bd059865732fd2cdd84f86350d
+size 4081005184

bode-7b-alpaca-q5_k_m.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:21c183335f33ff5a13ea3199fb656a1a33f1c7d152c636ae2859dfa22c07d6eb
+size 4783157888

bode-7b-alpaca-q8_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:42f9ecbb9579bc5c74852da6ae3f149039cc5bd1dd1657eea0ed524accc2caf9
+size 7161090688

config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+    "model_type": "llama"
+}