CLUSTERING GENE EXPRESSION DATA USING AN EFFECTIVE DISSIMILARITY MEASURE1

R. Das, D.K. Bhattacharyya, and J.K. Kalita

Keywords

Gene expression, dissimilarity measure, clustering, density based,frequent itemset mining, nearest neighbour

Abstract

This paper presents two clustering methods: the first one uses a density-based approach (DGC) and the second one uses a frequent itemset mining approach (FINN). DGC uses regulation information as well as order preserving ranking for identifying relevant clusters in gene expression data. FINN exploits the frequent itemsets and uses a nearest neighbour approach for clustering gene sets. Both the methods use a novel dissimilarity measure discussed in the paper. The clustering methods were experimented in light of real- life datasets and the methods have been established to perform satisfactorily. The methods were also compared with some well- known clustering algorithms and found to perform well in terms of homogeneity, silhouette and the z-score cluster validity measure.

Important Links:

Go Back