Improving Performance of Document Clustering Using Latent Semantic Index Approach

Lailil Muflikhah, Baharudin B. Baharum

Abstract


ABSTRACT

Document clustering is important to help users to retrieve the information they need. Initially, clustering is a method used to improve the precision and recall in information retrieval. The fuzzy clustering method is used to categorize document collections. Clustering of document involves huge volume of data that may be correlated either inter or intra documents. Hence, their pattern can be found by using Latent Semantic Index (LSI) approach. There are two methods used in this research, Singular Vector Decomposition (SVD) and Principal Component Analysis (PCA). The PCA is an extension of SVD method using data covariance. The aim of this study is to improve the performance of existing clustering algorithm (fuzzy c-Means) by simplified matrix dimension, which can contribute to improving the performance quality of document categorization. By various data volumes (class sizes) and topics, the experiment has shown that there is significant improvement for the performance quality of cluster either internal or external.

Keyword: document clustering, Latent Semantic Index, SVD, PCA, fuzzy c-means


Refbacks

  • There are currently no refbacks.