Performance Improving Topic Modeling with Big Data Environment

Main Article Content

ธนกร ญาณกาย
วนิดา แก่นอากาศ

Abstract

Data mining is a method which uses to find knowledge in data. There are many techniques to find the knowledge in text data such as document summation, Latent meaning, document topics, Document clustering. Latent Dirichlet Allocation (LDA) is an algorithm used to find hidden topics of the document, it can improve performance b parameters tuning. We use Ant colony optimization (ACO) to optimize LDA parameters.it takes a long time to calculate the topic from many documents. In this work, we apply a map-reduce programming technique which working under the Hadoop environment to accurately calculate time. The results have shown that processing documents with LDA with optimizing parameters by ACO under Hadoop environment is obviously faster and much improve performance compare to the one without map-reduce.

Article Details

Section
Research paper