Loading...

Heuristic Optimization for Feature Selection in Microarray Gene Expression Data for Cancer Classification

Textbook 2017 131 Pages

Summary

Cancer classification is a crucial area of research in the field of bioinformatics. Microarray technology has a great impact on cancer research and the gene expression data has been widely used to identify the marker genes related to a particular disease. For clinical applications, finding a small number of genes can lead to cost-effective treatment that is essential in predicting a patient’s survival time or diagnosing the cancer. High dimensionality is an important problem in microarray data which contains a great number of genes and a relatively small number of samples. Data mining is used to find interesting patterns and gain knowledge from a large volume of data. Classification is one of the most important techniques in data mining which finds the class labels of unknown data. The feature selection methods are used in classification to remove the noisy and irrelevant attributes and to improve the performance of the classifier. Data mining techniques have proven to be useful in understanding gene function, gene regulation, cellular processes and subtypes of cells. In this research work, the genes of the microarray data are ranked based on the statistical measures such as T-Statistics, Signal-to-noise ratio (SNR) and F-Statistics. The optimization techniques namely Genetic algorithm (GA), Particle swarm optimization (PSO), Cuckoo search (CS) and Shuffled frog leaping with levy flight (SFLLF) are employed to find informative features from the top-m ranked genes. The k-Nearest neighbor classifier (kNN), Support vector machine (SVM) and Naïve Bayes classifier (NBC) are used for classification. In the context of feature selection from the microarray data, an encoding solution represents the genes that are selected for classification. Each encoding solution is represented as chromosome, particle, egg and frog in GA, PSO, CS and SFLLF respectively. Ten publicly available gene expression datasets namely CNS, DLBCL Harvard, DLBCL outcome, Lung Cancer Michigan, Ovarian Cancer, Prostate outcome, AMLALL, Colon Tumor, Lung Harvard2 and Prostate are used for experimental analysis.

Details

Pages
131
Type of Edition
Erstausgabe
Year
2017
File size
1.7 MB
Language
English
Catalog Number
v358443
Institution / College
VIT University – VIT
Grade
9.0
Tags
Bioinformatics Microarray technology Marker Clinical application Data mining Microarray data Precautions against cancer Functional genomics

Authors

Previous

Title: Heuristic Optimization for Feature Selection in Microarray Gene Expression Data for Cancer Classification