Stable Feature Selection using Improved Whale Optimization Algorithm for Microarray Datasets

  • Dipti Theng
    Computer Technology Department, YCCE, Nagpur, Maharashtra deepti.theng[at]
  • Kishor K Bhoyar
    Computer Science and Engineering Department, YCCE, Nagpur, Maharashtra


A microarray is a collection of DNA sequences that reflect an organism’s whole gene set and are organized in a grid pattern for use in genetic testing. Microarray datasets are extremely high-dimensional and have a very small sample size, posing the challenges of insufficient data and high computational complexity. Identification of true biomarkers that are the most significant features (a very small subset of the complete feature set) is desired to solve these issues. This reduces over-fitting, and time complexity, and improves model generalization. Various feature selection algorithms are used for this biomarker identification. This research proposed a modification to the whale optimization algorithm (WOAm) for biomarker discovery, in which the fitness of each search agent is evaluated using the hinge loss function during the hunting for prey phase to determine the optimal search agent. Also compared the results of the proposed modified algorithm with the original whale optimization algorithm and also with contemporary algorithms like the marine predator algorithm and grey wolf optimization. All these algorithms are evaluated on six different high-dimensional microarray datasets. It has been observed that the proposed modification for the whale optimization algorithm has significantly improved the results of feature selection across all the datasets. Domain experts trust the resultant biomarker/ associated genes by the stability of the results obtained. The chosen feature set’s stability was also evaluated during the research work. According to the findings, our proposed WOAm has superior stability compared to other algorithms for the CNS, colon, Leukemia, and OSCC. datasets.
  • Referencias
  • Cómo citar
  • Del mismo autor
  • Métricas
Abu Khurma, R., Aljarah, I., Sharieh, A., Abd Elaziz, M., Damaševičius, R., & Krilavičius, T., 2022. A review of the modification strategies of the nature inspired algorithms for feature selection problem. Mathematics, 10(3), 464. 10.3390/math10030464
Agrawal, Prachi, et al. “Metaheuristic Algorithms on Feature Selection: A Survey of One Decade of Research (2009-2019).” IEEE Access 9 (2021): 26766–26791. 10.1109/ACCESS.2021.3056407
Alrefai, N., & Ibrahim, O., 2022. Optimized feature selection method using particle swarm intelligence with ensemble learning for cancer classification based on microarray datasets. Neural Computing and Applications, 1–16. 10.1007/s00521-022-07147-y
Alzaqebah, M., Briki, K., Alrefai, N., Brini, S., Jawarneh, S., Alsmadi, M. K., & Alqahtani, A., 2021. Memory based cuckoo search algorithm for feature selection of gene expression dataset. Informatics in Medicine Unlocked, 24, 100572. 10.1016/j.imu.2021.100572
Aziz, R. M., 2022. Cuckoo Search-Based Optimization for Cancer Classification: A New Hybrid Approach. Journal of Computational Biology, 29(6), 565–584. 10.1089/cmb.2021.0410
Fahrudin, T. M., Syarif, I., & Barakbah, A. R., 2016, September. Ant colony algorithm for feature selection on microarray datasets. In 2016 International Electronics Symposium (IES) (pp. 351–356). IEEE. 10.1109/ELECSYM.2016.7861030
Faramarzi, A., Heidarinejad, M., Mirjalili, S., & Gandomi, A. H., 2020. Marine Predators Algorithm: A nature-inspired metaheuristic. Expert systems with applications, 152, 113377. 10.1016/j.eswa.2020.113377
Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12, 95–116 (2007). 10.1007/s10115-006-0040-8
Khaire, U. M., & Dhanalakshmi, R., 2022. Stability investigation of improved whale optimization algorithm in the process of feature selection. IETE Technical Review, 39(2), 286–300. 10.1080/02564602.2020.1843554
Mahendru, S., & Agarwal, S., 2019. Feature selection using metaheuristic algorithms on medical datasets. In Harmony Search and Nature Inspired Optimization Algorithms (pp. 923–937). Singapore: Springer. 10.1007/978-981-13-0761-4_87
Mirjalili, S., & Lewis, A., 2016. The whale optimization algorithm. Advances in engineering software, 95, 51–67. 10.1016/j.advengsoft.2016.01.008
Mirjalili, S., Mirjalili, S. M., & Lewis, A., 2014. Grey wolf optimizer. Advances in engineering software, 69, 46–61. 10.1016/j.advengsoft.2013.12.007
Pashaei, E., & Pashaei, E., 2022. Hybrid binary COOT algorithm with simulated annealing for feature selection in high-dimensional microarray data. Neural Computing and Applications, 1–22. 10.1007/s00521-022-07780-7
Perez, M., & Marwala, T., 2012, November. Microarray data feature selection using hybrid genetic algorithm simulated annealing. In 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel (pp. 1–5). IEEE. 10.1109/EEEI.2012.6377146
Septian, F., & Utami, E., 2022. Whale Optimization Algorithm for Medical Diagnostic: A Systematic Literature Review. Jurnal Sistem Informasi Komputer dan Teknologi Informasi (SISKOMTI), 5(2), 33–44.
Shojaee, Z., Shahzadeh Fazeli, S. A., Abbasi, E., Adibnia, F., Masuli, F., & Rovetta, S., 2022. A Mutual Information Based on Ant Colony Optimization Method to Feature Selection for Categorical Data Clustering. Iranian Journal of Science and Technology, Transactions A: Science, 1–12. 10.1007/s40995-022-01395-2
Shukla, A. K., Tripathi, D., Reddy, B. R., & Chandramohan, D., 2020. A study on metaheuristics approaches for gene selection in microarray data: algorithms, applications and open challenges. Evolutionary Intelligence, 13(3), 309–329. 10.1007/s12065-019-00306-6
Silva, S. R. D., & Gertrudes, J. C., 2022, July. Chaotic genetic bee colony: combining chaos theory and genetic bee algorithm for feature selection in microarray cancer classification. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (pp. 296–299). 10.1145/3520304.3528901
Simumba, Naomi, et al., 2022. Multiple objective metaheuristics for feature selection based on stakeholder requirements in credit scoring. Decision Support Systems, 155, 113714. 10.1016/j.dss.2021.113714
Tan, F., Fu, X., Zhang, Y., & Bourgeois, A. G., 2006, July. Improving feature subset selection using a genetic algorithm for microarray gene expression data. In 2006 IEEE International Conference on Evolutionary Computation (pp. 2529–2534). IEEE.
Tawhid, M. A., & Ibrahim, A. M., 2020. Feature selection based on rough set approach, wrapper approach, and binary whale optimization algorithm. International journal of machine learning and cybernetics, 11(3), 573–602. 10.1007/s13042-019-00996-5
Theng, D., & Bhoyar, K. K., 2022, July. Stability of Feature Selection Algorithms. In Artificial Intelligence on Medical Data: Proceedings of International Symposium, ISCMM 2021 (pp. 299–316). Singapore: Springer Nature Singapore. 10.1007/978-981-19-0151-5_26
Theng, D., & Bhoyar, K. K., 2022, October. Feature Selection Techniques for Bioinformatics Data Analysis. In 2022 International Conference on Green Energy, Computing and Sustainable Technology (GECOST) (pp. 46–50). IEEE. 10.1109/GECOST55694.2022.10010541
Wang, X., & Jia, W., 2022, December. A Feature Weighting Particle Swarm Optimization Method to Identify Biomarker Genes. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 830–834). IEEE. 10.1109/BIBM55620.2022.9995376
Xie, W., Fang, Y., Yu, K., Min, X., & Li, W., 2022. MFRAG: Multi-Fitness RankAggreg Genetic Algorithm for biomarker selection from microarray data. Chemometrics and Intelligent Laboratory Systems, 226, 104573. 10.1016/j.chemolab.2022.104573
Zhong, C., Li, G., Meng, Z., Li, H., & He, W., 2023. A self-adaptive quantum equilibrium optimizer with artificial bee colony for feature selection. Computers in Biology and Medicine, 106520. 10.1016/j.compbiomed.2022.106520
Theng, D., & Bhoyar, K. K. (2023). Stable Feature Selection using Improved Whale Optimization Algorithm for Microarray Datasets. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 12(1), e31187.


Download data is not yet available.