M.U. Arshad and M.N. Ayyaz (Pakistan)
Data mining, knowledge discovery, multi-dimensional databases
Clustering, also known as unsupervised classification, aims at grouping data such that intra-group distances are minimized and inter-group distances are maximized. Most of the clustering algorithms use full dimensions of the feature/attribute space for partitioning objects into different groups. However, recent research suggests that clustering for high-dimensional spaces should search for hidden subspaces with lower dimensionalities, because it is more likely for data to form dense clusters in a high dimensional subspace. In this paper, we present a new, fast, and scalable clustering algorithm, ProjClusID, for the projective clustering problem. We use the concept of frequent itemset mining to find projective clusters. For this, we use discretization to map data from continuous to discrete domain. Our algorithm is density-based and grid based and finds the potential optimum clustering without requiring any parameter input. As a post-clustering step, the data is mapped back to its original continuous domain. Our experimental results on synthetic and real datasets show that ProjClusID algorithm improves on the accuracy and effectiveness of the previous techniques.
Important Links:
Go Back