e-palmisano
/

Phi3-ITA-mini-4K-instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Phi3-ITA-mini-4K-instruct / README.md

e-palmisano's picture

Update README.md

3d46b7d verified 5 months ago

|

No virus

2.6 kB

	---
	language:
	- it
	license: mit
	tags:
	- text-generation-inference
	- transformers
	- trl
	- sft
	- phi-3
	- phi-3-mini
	- italian
	base_model: microsoft/Phi-3-mini-4k-instruct
	---

	# Uploaded model

	- Developed by: Enzo Palmisano
	- License: mit
	- Finetuned from model : microsoft/Phi-3-mini-4k-instruct


	## Evaluation

	For a detailed comparison of model performance, check out the [Leaderboard for Italian Language Models](https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard).

	Here's a breakdown of the performance metrics:
	\| Metric \| hellaswag_it acc_norm \| arc_it acc_norm \| m_mmlu_it 5-shot acc \| Average \|
	\|:----------------------------\|:----------------------\|:----------------\|:---------------------\|:--------\|
	\| Accuracy Normalized \| 0.6088 \| 0.4440 \| X \| X \|

	---

	## How to Use

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
	import torch

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	tokenizer = AutoTokenizer.from_pretrained("e-palmisano/Phi3-ITA-mini-4k-instruct", trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained("e-palmisano/Phi3-ITA-mini-4k-instruct", trust_remote_code=True)
	model.to(device)


	generation_config = GenerationConfig(
	penalty_alpha=0.6, # The values balance the model confidence and the degeneration penalty in contrastive search decoding.
	do_sample = True, # Whether or not to use sampling ; use greedy decoding otherwise.
	top_k=5, # The number of highest probability vocabulary tokens to keep for top-k-filtering.
	temperature=0.001, # The value used to modulate the next token probabilities.
	repetition_penalty=1.7, # The parameter for repetition penalty. 1.0 means no penalty.
	max_new_tokens = 64, # The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt.
	eos_token_id=tokenizer.eos_token_id, # The id of the end-of-sequence token.
	pad_token_id=tokenizer.eos_token_id, # The id of the padding token.
	)


	def generate_answer(question):
	messages = [
	{"role": "user", "content": question},
	]
	model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
	outputs = model.generate(model_inputs, generation_config=generation_config)
	result = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
	return result


	question = """Quale è la torre più famosa di Parigi?"""
	answer = generate_answer(question)
	print(answer)
	```
	---