GSDMM Model Evaluation Techniques with Application to British Telecom Data

Research output: Chapter in Book/Report/Conference proceedingConference proceedings published in a bookpeer-review

2 Downloads (Pure)

Abstract

Statistical topic modelling has become one of the most important tools in the text processing field, as more applications are using it to handle the increasing amount of available text data, e.g. from social media platforms. The aim of topic modelling is to discover the main themes or topics from a collection of text documents. While several models have been developed, there is no consensus on evaluating the models, and how to determine the best hyper-parameters of the model. In this research, we develop a method for evaluating topic models for short text that employs word embedding and measuring within-topic variability and separation between topics. We focus on the Dirichlet Mixture Model and tuning its hyper-parameters. In empirical experiments, we present a case study on short text datasets related to the British telecommunication industry. In particular, we find that the optimal values of hyper-parameters, obtained from our evaluation method, do not agree with the fixed values typically used in the literature.

Original languageEnglish
Title of host publication5th International Conference on Statistics
Subtitle of host publicationTheory and Applications, ICSTA 2023
EditorsNoelle Samia, Dirk Husmeier
PublisherAvestia Publishing
ISBN (Print)9781990800252
DOIs
Publication statusPublished - 2023
Event5th International Conference on Statistics: Theory and Applications, ICSTA 2023 - London, United Kingdom
Duration: 3 Aug 20235 Aug 2023

Publication series

NameProceedings of the International Conference on Statistics
ISSN (Electronic)2562-7767

Conference

Conference5th International Conference on Statistics: Theory and Applications, ICSTA 2023
Country/TerritoryUnited Kingdom
CityLondon
Period3/08/235/08/23

ASJC Scopus subject areas

  • Applied Mathematics
  • Computational Mathematics
  • Statistics and Probability
  • Theoretical Computer Science

Keywords

  • Gibbs Sampling Dirichlet Multinomial Mixture (GSDMM)
  • hyper-parameters tuning
  • model evaluation
  • telecommunication industry
  • topic modelling

Fingerprint

Dive into the research topics of 'GSDMM Model Evaluation Techniques with Application to British Telecom Data'. Together they form a unique fingerprint.

Cite this