Spaces:

vectara
/

leaderboard

Running on CPU Upgrade

App Files Files Community

Miaoran000 commited on Sep 5

Commit

066863b

•

1 Parent(s): 2f52d69

Update src/display/about.py

Browse files

Files changed (1) hide show

src/display/about.py +7 -55

src/display/about.py CHANGED Viewed

@@ -24,7 +24,7 @@ TITLE = """<h1 align="center" id="space-title">Hughes Hallucination Evaluation M
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
 This leaderboard (by [Vectara](https://vectara.com)) evaluates how often an LLM introduces hallucinations when summarizing a document. <br>
-The leaderboard utilizes [HHEM](https://huggingface.co/vectara/hallucination_evaluation_model), an open source hallucination detection model.<br>
 """
@@ -38,9 +38,9 @@ Hallucinations refer to instances where a model introduces factually incorrect o
 ## How it works
-Using [Vectara](https://vectara.com)'s HHEM, we measure the occurrence of hallucinations in generated summaries.
 Given a source document and a summary generated by an LLM, HHEM outputs a hallucination score between 0 and 1, with 0 indicating complete hallucination and 1 representing perfect factual consistency.
-The model card for HHEM can be found [here](https://huggingface.co/vectara/hallucination_evaluation_model).
 ## Evaluation Dataset
@@ -60,59 +60,11 @@ If you would like to submit your model that is not available on the Hugging Face
 ## Model Submissions and Reproducibility
 You can submit your model for evaluation, whether it's hosted on the Hugging Face model hub or not. (Though it is recommended to host your model on the Hugging Face)
-### For models not available on the Hugging Face model hub:
-1) Access generated summaries used for evaluation [here](https://huggingface.co/datasets/vectara/leaderboard_results).
-2) The text generation prompt is available under "Prompt Used" section in the repository's README.
-3) Details on API Integration for evaluations are under "API Integration Details".
-### For models available on the Hugging Face model hub:
-To replicate the evaluation result for a Hugging Face model:
-1) Clone the Repository
-```python
-git lfs install
-git clone https://huggingface.co/spaces/vectara/leaderboard
-```
-2) Install the Requirements
-```python
-pip install -r requirements.txt
-```
-3) Set Up Your Hugging Face Token
-```python
-export HF_TOKEN=your_token
-```
-4) Run the Evaluation Script
-```python
-python main_backend.py --model your_model_id --precision float16
-```
-5) Check Results
-After the evaluation, results are saved in "eval-results-bk/your_model_id/results.json".
-## Results Format
-The results are structured in JSON as follows:
-```python
-{
-    "config": {
-        "model_dtype": "float16",
-        "model_name": "your_model_id",
-        "model_sha": "main"
-    },
-    "results": {
-        "hallucination_rate": {
-            "hallucination_rate": ...
-        },
-        "factual_consistency_rate": {
-            "factual_consistency_rate": ...
-        },
-        "answer_rate": {
-            "answer_rate": ...
-        },
-        "average_summary_length": {
-            "average_summary_length": ...
-        }
-    }
-}
-```
 For additional queries or model submissions, please contact [email protected].
 """

 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
 This leaderboard (by [Vectara](https://vectara.com)) evaluates how often an LLM introduces hallucinations when summarizing a document. <br>
+The leaderboard utilizes HHEM-2.1 hallucination detection model. The open source version of HHEM-2.1 can be found [here](https://huggingface.co/vectara/hallucination_evaluation_model).<br>
 """
 ## How it works
+Using [Vectara](https://vectara.com)'s HHEM-2.1 hallucination evaluation model, we measure the occurrence of hallucinations in generated summaries.
 Given a source document and a summary generated by an LLM, HHEM outputs a hallucination score between 0 and 1, with 0 indicating complete hallucination and 1 representing perfect factual consistency.
+The model card for HHEM-2.1-Open, which is the open source version of HHEM-2.1, can be found [here](https://huggingface.co/vectara/hallucination_evaluation_model).
 ## Evaluation Dataset
 ## Model Submissions and Reproducibility
 You can submit your model for evaluation, whether it's hosted on the Hugging Face model hub or not. (Though it is recommended to host your model on the Hugging Face)
+### Evaluation with HHEM-2.1-Open Locally
+1) You can access generated summaries from models on the leaderboard [here](https://huggingface.co/datasets/vectara/leaderboard_results). The text generation prompt is available under "Prompt Used" section in the repository's README.
+2) Check [here](https://huggingface.co/vectara/hallucination_evaluation_model) for more details on using HHEM-2.1-Open.
+Please note that our leaderboard is scored based on the HHEM-2.1 model, which excels in hallucination detection. While we offer HHEM-2.1-Open as an open-source alternative, it may produce slightly different results.
 For additional queries or model submissions, please contact [email protected].
 """