Feature Selection and Binary Classification Using Microarray Data
DNA microarray technology has greatly influenced the realms of biomedical research, with the hopes of significantly impacting the diagnosis and treatment of diseases. Microarrays have the ability to measure the expression levels of thousands of genes simultaneously. This research entails a rigorous and systematic comparison of statistical classifiers composed of supervised learning methods and either univariate or genetic algorithm-based multivariate feature subset selection (FSS) techniques, all applied to six published two-class microarray datasets. These analyses should provide insights into univariate vs. multivariate FSS and how to obtain realistic and honest misclassification error rates via cross-validation (CV). Ultimately, this research puts to test the more traditional implementations of CV and FSS and provides a solid foundation on which these topics can and should be further investigated when performing limited-sample classification using high-dimensional gene expression data. The analyses presented should be especially useful to researchers in the fields of bioinformatics, supervised learning, and variable selection, as well as anyone interested in microarray analysis.
Michael L. Lecocke, Ph.D.: Studied microarray gene expression analysis at Rice University and M.D. Anderson Cancer Center, Houston, TX. Currently Assistant Professor in Dept of Mathematics at St. Mary's University and Adjunct Assistant Professor in Dept of Epidemiology and Biostatistics at University of Texas Health Science Center, San Antonio, TX.