question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Evaluate 3 different topic modeling algorithms

See original GitHub issue
  • OCTIS version:
  • Python version:3,7
  • Operating System: linux

Description

I am a PhD candidate and I need to evaluate the performance of three different topic model algorithm including: LDA, LSI and Bertopic. ( LDA and LSI were trained using the Gensim package) what are the relevance metrics that I should use apart from coherence score? I would like to include in my paper a sort of table or graph that shows an evaluation in term of accuracy of the model (coherence score) and relevance of topics ( should I use the topic diversity metric ?) Thank you

What I Did

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
silviatticommented, Mar 2, 2022

Which diversity metric are you using? Can you also show the snippet of the code in which you call the metric? In general a metric in OCTIS expects to receive in input the output of a Model(). Any topic model in OCTIS returns a dictionary with up to 4 fields. Depending on the metric, the right field will be used to compute the metric (see here for the details on model_output) So if you want to use a metric that uses the word-topic distribution to compute the diversity, then you will construct your model_output like this:

model_output = {"topic-word-matrix": topic_term_dist}

And then use it to compute the score of a metric. For example,

div = KLDivergence()
result = div.score(model_output)

Let me know if it works.

Silvia

1reaction
silviatticommented, Feb 24, 2022

Hello, it depends on what your objective is. Any evaluation metric focuses on a specific aspect of a topic model. OCTIS includes different categories of evaluation metrics:

  • topic coherence metrics (evaluating if the top-words of the topics make sense together)
  • topic significance metrics that consider the document-topic and word-topic distributions to discover high-quality and junk topics. You can find the reference paper here.
  • classification metrics (F1, accuracy, etc), which use the document-topic distributions as features to train a classifier. These metrics require that documents are labelled.
  • diversity metrics, which consider the top-words or the word-topic distribution and compute the distance between a topic and the others.

I am not sure if BERTopic generates the document-topic and word-topic distributions (in that case, you will not be able to compute the topic significance metrics). Maybe you’d like to consider Contextualized Topic Models (CTM) which is a topic model that uses pre-trained contextualized representations (as BERTopic). CTM is part of OCTIS too.

Let me know if you have further questions,

Silvia

Read more comments on GitHub >

github_iconTop Results From Across the Web

Evaluation Methods for Topic Models
In this paper we consider only the simplest topic model, latent Dirichlet allocation (LDA), and compare a number of methods for estimating the...
Read more >
Topic Modelling Techniques - Medium
Brief Overview of different techniques used for topic modeling in NLP along with abstract code examples ... Have you ever had lots of...
Read more >
An Evaluation of Topic Modelling Techniques for Twitter
In this paper, we complete an evaluation of various topic modelling algorithms, and examine their performance when working with Twitter tweets.
Read more >
Topic Modeling: An Introduction - MonkeyLearn
In this guide, we're going to take a look at two types of topic analysis techniques: topic modeling and topic classification. Topic modeling...
Read more >
Using Topic Modeling Methods for Short-Text Data - Frontiers
The paper sheds light on some common topic modeling methods in a ... “Performance evaluation of topic modeling algorithms for text ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found