Comparing Topological Communities and Communities of Interest Using Topic Modeling
Publication date
Authors
DOI
Document Type
Master Thesis
Metadata
Show full item recordCollections
License
CC-BY-NC-ND
Abstract
In this thesis I propose the repurposing of Latent Dirichlet Allocation (LDA), a topic modeling algorithm, for the discovery of communities of interest. To test it, I use it to discover communities on the social news and entertainment website reddit. I then use it to compare the composition of communities of interest to that of topological communities: communities discovered based on the topology of social graphs. I use both methods to find communities based on the Enron email corpus, and compare their results using cluster evaluation methods.
Keywords
topic modeling;latent dirichlet allocation;LDA;machine learning;unsupervised learning;communities;community of interest;topological community;graph;social graph;reddit;Enron;mutual information;normalised mutual information;NMI;Jaccard Index;cluster validation;information theory