rasyosef commited on
Commit
ccd47e0
1 Parent(s): 267850e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -99,9 +99,13 @@ Note: If you want to use flash attention, call _AutoModelForCausalLM.from_pretra
99
 
100
  This model outperforms HuggingFace's SmolLM-1.7B-Instruct and the TinyLlama-1.1B-Chat-v1.0 models on IFEval and GSM8K benchmarks.
101
 
 
 
 
102
  |Model|Size (# params)|IFEval|GSM8K|
103
  |:----|:--------------|:-----|:----|
104
  |[Phi-1_5-Instruct-v0.1](https://huggingface.co/rasyosef/Phi-1_5-Instruct-v0.1)|1.4B|**26.71**|**41.78**|
105
  |[SmolLM-1.7B-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-1.7B-Instruct)|1.7B|24.21|3.45|
106
  |[TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)|1.1B|21.23|0|
107
  |[phi-1_5](https://huggingface.co/microsoft/phi-1_5)|1.4B|20.51|31.73|
 
 
99
 
100
  This model outperforms HuggingFace's SmolLM-1.7B-Instruct and the TinyLlama-1.1B-Chat-v1.0 models on IFEval and GSM8K benchmarks.
101
 
102
+ - **IFEval (Instruction Following Evaluation)**: IFEval is a fairly interesting dataset that tests the capability of models to clearly follow explicit instructions, such as “include keyword x” or “use format y”. The models are tested on their ability to strictly follow formatting instructions rather than the actual contents generated, allowing strict and rigorous metrics to be used.
103
+ - **GSM8k (5-shot)**: diverse grade school math word problems to measure a model's ability to solve multi-step mathematical reasoning problems.
104
+
105
  |Model|Size (# params)|IFEval|GSM8K|
106
  |:----|:--------------|:-----|:----|
107
  |[Phi-1_5-Instruct-v0.1](https://huggingface.co/rasyosef/Phi-1_5-Instruct-v0.1)|1.4B|**26.71**|**41.78**|
108
  |[SmolLM-1.7B-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-1.7B-Instruct)|1.7B|24.21|3.45|
109
  |[TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)|1.1B|21.23|0|
110
  |[phi-1_5](https://huggingface.co/microsoft/phi-1_5)|1.4B|20.51|31.73|
111
+