Heuristic Optimization for Feature Selection in Microarray Gene Expression Data for Cancer Classification

C., Gunavathi; Kandhasamy, Premalatha

Cancer classification is a crucial area of research in the field of bioinformatics. Microarray technology has a great impact on cancer research and the gene expression data has been widely used to identify the marker genes related to a particular disease. For clinical applications, finding a small number of genes can lead to cost-effective treatment that is essential in predicting a patient’s survival time or diagnosing the cancer. High dimensionality is an important problem in microarray data which contains a great number of genes and a relatively small number of samples. Data mining is used to find interesting patterns and gain knowledge from a large volume of data. Classification is one of the most important techniques in data mining which finds the class labels of unknown data. The feature selection methods are used in classification to remove the noisy and irrelevant attributes and to improve the performance of the classifier. Data mining techniques have proven to be useful in understanding gene function, gene regulation, cellular processes and subtypes of cells. In this research work, the genes of the microarray data are ranked based on the statistical measures such as T-Statistics, Signal-to-noise ratio (SNR) and F-Statistics. The optimization techniques namely Genetic algorithm (GA), Particle swarm optimization (PSO), Cuckoo search (CS) and Shuffled frog leaping with levy flight (SFLLF) are employed to find informative features from the top-m ranked genes. The k-Nearest neighbor classifier (kNN), Support vector machine (SVM) and Naïve Bayes classifier (NBC) are used for classification. In the context of feature selection from the microarray data, an encoding solution represents the genes that are selected for classification. Each encoding solution is represented as chromosome, particle, egg and frog in GA, PSO, CS and SFLLF respectively. Ten publicly available gene expression datasets namely CNS, DLBCL Harvard, DLBCL outcome, Lung Cancer Michigan, Ovarian Cancer, Prostate outcome, AMLALL, Colon Tumor, Lung Harvard2 and Prostate are used for experimental analysis.

Heuristic Optimization for Feature Selection in Microarray Gene Expression Data for Cancer Classification

Summary

Excerpt

Table Of Contents

Details

Authors

Gunavathi C. (Author)

Dr. Premalatha Kandhasamy (Author)