File size: 2,919 Bytes
dd9c692
 
 
 
5169641
dd9c692
f4eff74
5169641
 
 
 
 
 
 
f4eff74
 
 
f658023
f4eff74
 
 
 
 
 
 
 
 
 
 
 
 
 
f658023
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
library_name: transformers
tags: []
---
**Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.


## Intended Use

**Intended Use Cases** Llama 3.1 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.1 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.1 Community License allows for these use cases. 

**Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond those explicitly referenced as supported in this model card**.
**<span style="text-decoration:underline;">Note</span>: Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner.

## How to use


```python
from transformers import AutoModelForCausalLM,AutoTokenizer
 
base_model = 'Konthee/llama-3.1-8b-instruct-North-Thai'

tokenizer = AutoTokenizer.from_pretrained(base_model_name)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map="auto",
)


text = 'มะ ใด จะ เตี้ยง ใคร่ อยาก กิ๋น เข้า เข้า เจ๊า วัน นี้ ก่อ บ่า ได้ กิ๋น ลุก ขวาย บ่า ตัน กิ๋น'
messages = [
  {"role": "system", "content": "translate Northern thai language to Central thai language."},
  {"role": "user", "content":text}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>"),
    tokenizer.convert_tokens_to_ids("<|end_of_text|>")   
]

outputs = model.generate(input_ids=input_ids,
                             do_sample = True,
                             eos_token_id=terminators,
                             max_new_tokens=128,
                             temperature=0.1,
                             top_p=0.9,       
                            )
out = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True,)
print(out)
```