The Fuzzy Gene Filter: A Classifier Performance Assessment

Meir Perez and Tshilidzi Marwala

Keywords

Classifier, Feature Selection, Fuzzy Gene Filter, Microarray

Abstract

The Fuzzy Gene Filter (FGF) is an optimised Fuzzy Inference System designed to rank genes in order of differential expression, based on expression data generated in a microarray experiment. This paper examines the effectiveness of the FGF for feature selection using various classification architectures. The FGF is compared to three of the most common gene ranking algorithms: t-test, Wilcoxon test and ROC curve analysis. Four classification schemes are used to compare the performance of the FGF vis-à-vis the standard approaches: K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naïve Bayesian Classifier (NBC) and Artificial Neural Network (ANN). A nested stratified Leave-One-Out Cross Validation scheme is used to identify the optimal number top ranking genes, as well as the optimal classifier parameters. Two microarray data sets are used for the comparison: a prostate cancer data set and a lymphoma data set.

Genes ranked by the FGF attained significantly higher accuracies for all of the classifiers tested, on both data sets (p = 0.0231 for the prostate data set and p = 0.1888 for the lymphoma data set). When using the prostate data set, the FGF performed best on the KNN classifier, achieving an accuracy of 96.1% with the top 9 ranking genes. When using the lymphoma data set, the FGF performed best on the SVM classifier, achieving an accuracy of 100% with the top 12 ranking genes. The performance of the FGF is attributed to the fact that it is optimised to rank genes in such a way that results in maximum class separability, as well as its incorporation of multiple features of the data when ranking genes.

Important Links:



Go Back