Abstract
Gene selection is crucial for cancer classification using microarray data. In the interests of improving
cancer classification accuracy, in this paper, we developed a new wrapper method called ieGENES for gene
selection. First we proposed a parsimonious kernel machine regularization (PKMR) model by using
ridge regularization in kernel machine driven classification to tackle multi-collinearity for the sake of stable
estimates in high-dimensional settings. Then the ieGENES algorithm was developed to optimally identify
relevant genes while iteratively eliminating redundant ones based on leave-one-out cross-validation accuracy.
In particular, we developed a new methodology to optimally update model parameters upon gene removal.
The ieGENES algorithm was evaluated on six cancer microarray datasets and compared to existing methods.
Classification accuracy and number of differentially expressed genes (DEGs) identified were assessed. In
terms of gene selection accuracy, the ieGENES outperformed multiple wrapper methods on 5 out of 6
datasets (Colon, Leukemia, Hepato, Glioma, and Breast Cancers), with statistically significant improvements
(𝑝 < 0.001). For the Colon dataset, ieGENES achieved 96.21% accuracy with 167 DEGs. The proposed ieGENES
technique demonstrated superior performance in identifying DEGs for cancer diagnosis comparing with
existing techniques. It offers a promising tool for identifying biologically relevant genes in microarray data
analysis and biomarker discovery for cancer research.
cancer classification accuracy, in this paper, we developed a new wrapper method called ieGENES for gene
selection. First we proposed a parsimonious kernel machine regularization (PKMR) model by using
ridge regularization in kernel machine driven classification to tackle multi-collinearity for the sake of stable
estimates in high-dimensional settings. Then the ieGENES algorithm was developed to optimally identify
relevant genes while iteratively eliminating redundant ones based on leave-one-out cross-validation accuracy.
In particular, we developed a new methodology to optimally update model parameters upon gene removal.
The ieGENES algorithm was evaluated on six cancer microarray datasets and compared to existing methods.
Classification accuracy and number of differentially expressed genes (DEGs) identified were assessed. In
terms of gene selection accuracy, the ieGENES outperformed multiple wrapper methods on 5 out of 6
datasets (Colon, Leukemia, Hepato, Glioma, and Breast Cancers), with statistically significant improvements
(𝑝 < 0.001). For the Colon dataset, ieGENES achieved 96.21% accuracy with 167 DEGs. The proposed ieGENES
technique demonstrated superior performance in identifying DEGs for cancer diagnosis comparing with
existing techniques. It offers a promising tool for identifying biologically relevant genes in microarray data
analysis and biomarker discovery for cancer research.
Original language | English |
---|---|
Article number | 104803 |
Pages (from-to) | 104803 |
Journal | Journal of Biomedical Informatics |
Volume | 164 |
DOIs | |
Publication status | Published - 28 Feb 2025 |
ASJC Scopus subject areas
- Health Informatics
- Computer Science Applications
Keywords
- Cancer
- Differentially expressed genes
- Gene detection
- Kernel machines
- Machine learning
- Microarray data