Spaces:

vectara
/

leaderboard

Running on CPU Upgrade

App Files Files Community

Miaoran000 commited on Aug 2

Commit

8135339

•

1 Parent(s): 7ef82ad

update text description

Browse files

Files changed (3) hide show

.gitignore +3 -0
app.py +1 -1
src/display/about.py +5 -6

.gitignore CHANGED Viewed

@@ -14,12 +14,15 @@ auto_evals/
 eval-queue-bk/
 eval-results-bk/
 eval-results-bk_hhem21/
 src/assets/model_counts.html
 generation_results/
 Hallucination Leaderboard Results
 dataset_stats.py
 get_comparison.py
 GPT-4-Turbo_v.s._GPT-4o.csv

 eval-queue-bk/
 eval-results-bk/
 eval-results-bk_hhem21/
+eval-results_hhem21/
+hhem21_server/
 src/assets/model_counts.html
 generation_results/
 Hallucination Leaderboard Results
 dataset_stats.py
+hhem_v21_eval.py
 get_comparison.py
 GPT-4-Turbo_v.s._GPT-4o.csv

app.py CHANGED Viewed

@@ -24,7 +24,7 @@ except Exception:
 try:
     print(envs.EVAL_RESULTS_PATH)
     snapshot_download(
-        repo_id=envs.RESULTS_REPO, local_dir=envs.EVAL_RESULTS_PATH, repo_type="dataset", tqdm_class=None, etag_timeout=30
     )
 except Exception:
     restart_space()

 try:
     print(envs.EVAL_RESULTS_PATH)
     snapshot_download(
+        repo_id=envs.RESULTS_REPO, revision='hhem21', local_dir=envs.EVAL_RESULTS_PATH, repo_type="dataset", tqdm_class=None, etag_timeout=30
     )
 except Exception:
     restart_space()

src/display/about.py CHANGED Viewed

@@ -25,7 +25,6 @@ TITLE = """<h1 align="center" id="space-title">Hughes Hallucination Evaluation M
 INTRODUCTION_TEXT = """
 This leaderboard (by [Vectara](https://vectara.com)) evaluates how often an LLM introduces hallucinations when summarizing a document. <br>
 The leaderboard utilizes [HHEM](https://huggingface.co/vectara/hallucination_evaluation_model), an open source hallucination detection model.<br>
-An improved version (HHEM v2) is integrated into the [Vectara platform](https://console.vectara.com/signup/?utm_source=huggingface&utm_medium=space&utm_term=integration&utm_content=console&utm_campaign=huggingface-space-integration-console).
 """
@@ -46,7 +45,7 @@ The model card for HHEM can be found [here](https://huggingface.co/vectara/hallu
 ## Evaluation Dataset
 Our evaluation dataset consists of 1006 documents from multiple public datasets, primarily [CNN/Daily Mail Corpus](https://huggingface.co/datasets/cnn_dailymail/viewer/1.0.0/test).
-We generate summaries for each of these documents using submitted LLMs and compute hallucination scores for each pair of document and generated summary. (Check the prompt we used [here](https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard))
 ## Metrics Explained
 - Hallucination Rate: Percentage of summaries with a hallucination score below 0.5
@@ -55,14 +54,14 @@ We generate summaries for each of these documents using submitted LLMs and compu
 - Average Summary Length: The average word count of generated summaries
 ## Note on non-Hugging Face models
-On HHEM leaderboard, There are currently models such as GPT variants that are not available on the Hugging Face model hub. We ran the evaluations for these models on our own and uploaded the results to the leaderboard.
-If you would like to submit your model that is not available on the Hugging Face model hub, please contact us at minseok@vectara.com.
 ## Model Submissions and Reproducibility
 You can submit your model for evaluation, whether it's hosted on the Hugging Face model hub or not. (Though it is recommended to host your model on the Hugging Face)
 ### For models not available on the Hugging Face model hub:
-1) Access generated summaries used for evaluation [here](https://github.com/vectara/hallucination-leaderboard) in "leaderboard_summaries.csv".
 2) The text generation prompt is available under "Prompt Used" section in the repository's README.
 3) Details on API Integration for evaluations are under "API Integration Details".
@@ -114,7 +113,7 @@ The results are structured in JSON as follows:
     }
 }
 ```
-For additional queries or model submissions, please contact minseok@vectara.com.
 """
 EVALUATION_QUEUE_TEXT = """

 INTRODUCTION_TEXT = """
 This leaderboard (by [Vectara](https://vectara.com)) evaluates how often an LLM introduces hallucinations when summarizing a document. <br>
 The leaderboard utilizes [HHEM](https://huggingface.co/vectara/hallucination_evaluation_model), an open source hallucination detection model.<br>
 """
 ## Evaluation Dataset
 Our evaluation dataset consists of 1006 documents from multiple public datasets, primarily [CNN/Daily Mail Corpus](https://huggingface.co/datasets/cnn_dailymail/viewer/1.0.0/test).
+We generate summaries for each of these documents using submitted LLMs and compute hallucination scores for each pair of document and generated summary. (Check the prompt we used [here](https://github.com/vectara/hallucination-leaderboard))
 ## Metrics Explained
 - Hallucination Rate: Percentage of summaries with a hallucination score below 0.5
 - Average Summary Length: The average word count of generated summaries
 ## Note on non-Hugging Face models
+On HHEM leaderboard, there are currently models such as GPT variants that are not available on the Hugging Face model hub. We ran the evaluations for these models on our own and uploaded the results to the leaderboard.
+If you would like to submit your model that is not available on the Hugging Face model hub, please contact us at ofer@vectara.com.
 ## Model Submissions and Reproducibility
 You can submit your model for evaluation, whether it's hosted on the Hugging Face model hub or not. (Though it is recommended to host your model on the Hugging Face)
 ### For models not available on the Hugging Face model hub:
+1) Access generated summaries used for evaluation [here](https://huggingface.co/datasets/vectara/leaderboard_results).
 2) The text generation prompt is available under "Prompt Used" section in the repository's README.
 3) Details on API Integration for evaluations are under "API Integration Details".
     }
 }
 ```
+For additional queries or model submissions, please contact ofer@vectara.com.
 """
 EVALUATION_QUEUE_TEXT = """