Edit model card

BERTopic-summcomparer-gauntlet-v0p1-sentence-t5-xl-summary

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Hierarchy of topics:

Hierarchy

Usage

To use this model, please install BERTopic:

pip install -U -q bertopic safetensors

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("pszemraj/BERTopic-summcomparer-gauntlet-v0p1-sentence-t5-xl-summary")

topic_model.visualize_topics()

# for dataframe:
# topic_model.get_topic_info()

predicting new instances:

topic, embedding = topic_model.transform(text)
print(topic)

Topic overview

  • Number of topics: 24
  • Number of training documents: 1960
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 no_saic_raw_sp - sep_4 - sec - data - image 13 -1_no_saic_raw_sp_sep_4_sec_data
0 lecture - applications - methods - learning - topics 104 0_lecture_applications_methods_learning
1 cogvideo - videos - cogview2 - cog - video 303 1_cogvideo_videos_cogview2_cog
2 ship - rainsford - hunted - island - hunts 117 2_ship_rainsford_hunted_island
3 films - dissertation - film - noir - identity 106 3_films_dissertation_film_noir
4 linguistics - language - languages - foundational - systems 104 4_linguistics_language_languages_foundational
5 nemo - dory - transcript - clownfish - fish 103 5_nemo_dory_transcript_clownfish
6 train - bruno - washington - station - tennis 102 6_train_bruno_washington_station
7 images - representations - image - captions - representation 102 7_images_representations_image_captions
8 merge - merging - explain - concept - problems 102 8_merge_merging_explain_concept
9 enhancement - enhancing - recordings - improve - waveforms 100 9_enhancement_enhancing_recordings_improve
10 arendelle - elsa - frozen - kristoff - olaf 99 10_arendelle_elsa_frozen_kristoff
11 scene - story - script - movie - gillis 97 11_scene_story_script_movie
12 lecture - lemmatization - nlp - medical - techniques 96 12_lecture_lemmatization_nlp_medical
13 questions - topics - conversation - terrance - talk 85 13_questions_topics_conversation_terrance
14 sniper - kill - fury - combat - narrator 81 14_sniper_kill_fury_combat
15 images - lecture - ezurich - pathology - medical 67 15_images_lecture_ezurich_pathology
16 timeseries - framework - interpretability - representations - next_concept 37 16_timeseries_framework_interpretability_representations
17 prediction - predictions - forecasting - predict - markov 27 17_prediction_predictions_forecasting_predict
18 images - imaging - computational - convolutional - lecture 27 18_images_imaging_computational_convolutional
19 technology - treatment - methods - medical - detection 27 19_technology_treatment_methods_medical
20 novel - translation - henry - read - learn 23 20_novel_translation_henry_read
21 abridged - brief - synopsis - short - citations 22 21_abridged_brief_synopsis_short
22 lecture - pathology - medical - computational - patients 16 22_lecture_pathology_medical_computational

Training hyperparameters

  • calculate_probabilities: True
  • language: None
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.29
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.29.2
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.11
Downloads last month
2
Inference Examples
Inference API (serverless) has been turned off for this model.

Dataset used to train pszemraj/BERTopic-summcomparer-gauntlet-v0p1-sentence-t5-xl-summary