Miaoran000 commited on
Commit
066863b
1 Parent(s): 2f52d69

Update src/display/about.py

Browse files
Files changed (1) hide show
  1. src/display/about.py +7 -55
src/display/about.py CHANGED
@@ -24,7 +24,7 @@ TITLE = """<h1 align="center" id="space-title">Hughes Hallucination Evaluation M
24
  # What does your leaderboard evaluate?
25
  INTRODUCTION_TEXT = """
26
  This leaderboard (by [Vectara](https://vectara.com)) evaluates how often an LLM introduces hallucinations when summarizing a document. <br>
27
- The leaderboard utilizes [HHEM](https://huggingface.co/vectara/hallucination_evaluation_model), an open source hallucination detection model.<br>
28
 
29
  """
30
 
@@ -38,9 +38,9 @@ Hallucinations refer to instances where a model introduces factually incorrect o
38
 
39
  ## How it works
40
 
41
- Using [Vectara](https://vectara.com)'s HHEM, we measure the occurrence of hallucinations in generated summaries.
42
  Given a source document and a summary generated by an LLM, HHEM outputs a hallucination score between 0 and 1, with 0 indicating complete hallucination and 1 representing perfect factual consistency.
43
- The model card for HHEM can be found [here](https://huggingface.co/vectara/hallucination_evaluation_model).
44
 
45
  ## Evaluation Dataset
46
 
@@ -60,59 +60,11 @@ If you would like to submit your model that is not available on the Hugging Face
60
  ## Model Submissions and Reproducibility
61
  You can submit your model for evaluation, whether it's hosted on the Hugging Face model hub or not. (Though it is recommended to host your model on the Hugging Face)
62
 
63
- ### For models not available on the Hugging Face model hub:
64
- 1) Access generated summaries used for evaluation [here](https://huggingface.co/datasets/vectara/leaderboard_results).
65
- 2) The text generation prompt is available under "Prompt Used" section in the repository's README.
66
- 3) Details on API Integration for evaluations are under "API Integration Details".
67
 
68
- ### For models available on the Hugging Face model hub:
69
- To replicate the evaluation result for a Hugging Face model:
70
-
71
- 1) Clone the Repository
72
- ```python
73
- git lfs install
74
- git clone https://huggingface.co/spaces/vectara/leaderboard
75
- ```
76
- 2) Install the Requirements
77
- ```python
78
- pip install -r requirements.txt
79
- ```
80
- 3) Set Up Your Hugging Face Token
81
- ```python
82
- export HF_TOKEN=your_token
83
- ```
84
- 4) Run the Evaluation Script
85
- ```python
86
- python main_backend.py --model your_model_id --precision float16
87
- ```
88
- 5) Check Results
89
- After the evaluation, results are saved in "eval-results-bk/your_model_id/results.json".
90
-
91
- ## Results Format
92
- The results are structured in JSON as follows:
93
- ```python
94
- {
95
- "config": {
96
- "model_dtype": "float16",
97
- "model_name": "your_model_id",
98
- "model_sha": "main"
99
- },
100
- "results": {
101
- "hallucination_rate": {
102
- "hallucination_rate": ...
103
- },
104
- "factual_consistency_rate": {
105
- "factual_consistency_rate": ...
106
- },
107
- "answer_rate": {
108
- "answer_rate": ...
109
- },
110
- "average_summary_length": {
111
- "average_summary_length": ...
112
- }
113
- }
114
- }
115
- ```
116
  For additional queries or model submissions, please contact [email protected].
117
  """
118
 
 
24
  # What does your leaderboard evaluate?
25
  INTRODUCTION_TEXT = """
26
  This leaderboard (by [Vectara](https://vectara.com)) evaluates how often an LLM introduces hallucinations when summarizing a document. <br>
27
+ The leaderboard utilizes HHEM-2.1 hallucination detection model. The open source version of HHEM-2.1 can be found [here](https://huggingface.co/vectara/hallucination_evaluation_model).<br>
28
 
29
  """
30
 
 
38
 
39
  ## How it works
40
 
41
+ Using [Vectara](https://vectara.com)'s HHEM-2.1 hallucination evaluation model, we measure the occurrence of hallucinations in generated summaries.
42
  Given a source document and a summary generated by an LLM, HHEM outputs a hallucination score between 0 and 1, with 0 indicating complete hallucination and 1 representing perfect factual consistency.
43
+ The model card for HHEM-2.1-Open, which is the open source version of HHEM-2.1, can be found [here](https://huggingface.co/vectara/hallucination_evaluation_model).
44
 
45
  ## Evaluation Dataset
46
 
 
60
  ## Model Submissions and Reproducibility
61
  You can submit your model for evaluation, whether it's hosted on the Hugging Face model hub or not. (Though it is recommended to host your model on the Hugging Face)
62
 
63
+ ### Evaluation with HHEM-2.1-Open Locally
64
+ 1) You can access generated summaries from models on the leaderboard [here](https://huggingface.co/datasets/vectara/leaderboard_results). The text generation prompt is available under "Prompt Used" section in the repository's README.
65
+ 2) Check [here](https://huggingface.co/vectara/hallucination_evaluation_model) for more details on using HHEM-2.1-Open.
66
+ Please note that our leaderboard is scored based on the HHEM-2.1 model, which excels in hallucination detection. While we offer HHEM-2.1-Open as an open-source alternative, it may produce slightly different results.
67
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  For additional queries or model submissions, please contact [email protected].
69
  """
70