Quantifying Topic Model Influence on Text Layouts Based on Dimensionality Reductions

Daniel Atzberger, Tim Cech, Willy Scheibel, Jurgen Dollner, Tobias Schreck

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem KonferenzbandBegutachtung

Abstract

Text spatializations for text corpora often rely on two-dimensional scatter plots generated from topic models and dimensionality reductions. Topic models are unsupervised learning algorithms that identify clusters, so-called topics, within a corpus, representing the underlying concepts. Furthermore, topic models transform documents into vectors, capturing their association with topics. A subsequent dimensionality reduction creates a two-dimensional scatter plot, illustrating semantic similarity between the documents. A recent study by Atzberger et al. has shown that topic models are beneficial for generating two-dimensional layouts. However, in their study, the hyperparameters of the topic models are fixed, and thus the study does not analyze the impact of the topic models’ quality on the resulting layout. Following the methodology of Atzberger et al., we present a comprehensive benchmark comprising (1) text corpora, (2) layout algorithms based on topic models and dimensionality reductions, (3) quality metrics for assessing topic models, and (4) metrics for evaluating two-dimensional layouts’ accuracy and cluster separation. Our study involves an exhaustive evaluation of numerous parameter configurations, yielding a dataset that quantifies the quality of each dataset-layout algorithm combination. Through a rigorous analysis of this dataset, we derive practical guidelines for effectively employing topic models in text spatializations. As a main result, we conclude that the quality of a topic model measured by coherence is positively correlated to the layout quality in the case of Latent Semantic Indexing and Non-Negative Matrix Factorization.

Originalspracheenglisch
Titel Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications
Herausgeber (Verlag)SciTePress
Seiten593-602
Seitenumfang10
Band1, GRAPP, HUCAPP and IVAPP
ISBN (elektronisch)978-989-758-679-8
DOIs
PublikationsstatusVeröffentlicht - 2024
Veranstaltung19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications: VISIGRAPP 2024 - Rome, Italien
Dauer: 27 Feb. 202429 Feb. 2024

Konferenz

Konferenz19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications
KurztitelVISIGRAPP 2024
Land/GebietItalien
OrtRome
Zeitraum27/02/2429/02/24

ASJC Scopus subject areas

  • Computergrafik und computergestütztes Design
  • Maschinelles Sehen und Mustererkennung
  • Human-computer interaction

Fingerprint

Untersuchen Sie die Forschungsthemen von „Quantifying Topic Model Influence on Text Layouts Based on Dimensionality Reductions“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren