Integrated Ensemble Strategy for Breast Cancer Detection Using Dimensionality Reduction Technique

  • Zulfikar Ali Ansari
    Department of AI and ML, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, Maharashtra-412115, India zulfi78692[at]gmail.com
  • Mohammad Arif
    Department of Computer Science & Engineering, Parul University, Vadodra-391760, India
  • Nagendra Babu Rajaboina
    Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh-522302, India
  • Anwar Ahamed Shaikh
    Department of CSE-AI, Noida Institute of Engineering and Technology, Greater Noida, Uttar Pradesh-201306, India
  • Yaduvir Singh
    School of Engineering and Technology, Sanjivani University Kopargaon Maharashtra

Abstract

Breast cancer remains a critical global health concern, requiring advanced and reliable diagnostic methods for early detection and effective intervention. This work introduces an integrated ensemble framework that combines multiple dimensionality reduction (DR) techniques, including Principal Component Analysis (PCA), Non-negative Matrix Factorization (NMF), and Singular Value Decomposition (SVD), with robust machine learning (ML) classifiers for improved breast cancer detection. The publicly available Wisconsin Breast Cancer Dataset (WBCD) was utilized, with rigorous data preprocessing performed to address missing values, anomalies, and class imbalance through stratified sampling and median imputation. To mitigate overfitting and underfitting, dimensionality reduction was coupled with cross-validation and ensemble strategies. The predictive performance of Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), and Multi-Layer Perceptron (MLP) was systematically evaluated. Experimental results show that SVM consistently achieves a maximum accuracy of 97.9 % across all applied DR techniques, while MLP and LR also reach 97.9 % accuracy with PCA and NMF, though MLP exhibits performance variability depending on the selected DR method. The findings provide practical guidance for healthcare practitioners and researchers, supporting the adoption of explainable and scalable AI-driven diagnostic tools. Limitations include the reliance on a single dataset and the need for further validation on larger and more diverse clinical cohorts. Future work will focus on enhancing model interpretability, external validation, and real-world deployment in resource-constrained settings.
  • Referencias
  • Cómo citar
  • Del mismo autor
  • Métricas
Abdullah, N. N. M. (2023). Coercive Biomedical Body Politics: Redefining Breast Cancer as a Gender-Marked Experience in the Case Study of Linda Park-Fuller’s ‘A Clean Breast of It.’ International Journal of Arabic-English Studies, 23(1), 327–344. https://doi.org/10.33806/IJAES2000.23.1.17

Al-Fahaidy, F. A. K., Al-Fuhaidi, B., AL-Darouby, I., AL-Abady, F., AL-Qadry, M., & AL-Gamal, A. (2022). A Diagnostic Model of Breast Cancer Based on Digital Mammogram Images Using Machine Learning Techniques. Applied Computational Intelligence and Soft Computing, 2022, 1–17. https://doi.org/10.1155/2022/3895976

Ara, S., Das, A., & Dey, A. (2021). Malignant and Benign Breast Cancer Classification using Machine Learning Algorithms. 2021 International Conference on Artificial Intelligence (ICAI), 97–101. https://doi.org/10.1109/ICAI52203.2021.9445249

Ashokkumar, N., Meera, S., Anandan, P., Murthy, M. Y. B., Kalaivani, K. S., Alahmadi, T. A., Alharbi, S. A., Raghavan, S. S., & Jayadhas, S. A. (2022). Deep Learning Mechanism for Predicting the Axillary Lymph Node Metastasis in Patients with Primary Breast Cancer. BioMed Research International, 2022. https://doi.org/10.1155/2022/8616535

Bansal, M., Goyal, A., & Choudhary, A. (2022). A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short-Term Memory algorithms in machine learning. Decision Analytics Journal, 3, e100071. https://doi.org/10.1016/J.DAJOUR.2022.100071

Bhatia, V., Rawat, P., Kumar, A., & Shah, R. R. (2019). End-to-End Resume Parsing and Finding Candidates for a Job Description using BERT. http://arxiv.org/abs/1910.03089

Binsaif, N. (2022). Application of Machine Learning Models to the Detection of Breast Cancer. Mobile Information Systems, 2022. https://doi.org/10.1155/2022/7340689

Botlagunta, M., Botlagunta, M. D., Myneni, M. B., Lakshmi, D., Nayyar, A., Gullapalli, J. S., & Shah, M. A. (2023). Classification and diagnostic prediction of breast cancer metastasis on clinical data using machine learning algorithms. Scientific Reports 2023 13:1, 13(1), 1–17. https://doi.org/10.1038/s41598-023-27548-w

Esfahani, P. R., Maalouf, M. M., Reddy, A. J., & Chawla, P. (2024). Abstract A088: Utilizing Machine Learning Techniques to Investigate Mammograms for Breast Cancer Detection. Cancer Research, 84(3_Supplement_1), A088–A088. https://doi.org/10.1158/1538-7445.ADVBC23-A088

Fatima, N., Liu, L., Hong, S., & Ahmed, H. (2020). Prediction of Breast Cancer, Comparative Review of Machine Learning Techniques, and Their Analysis. IEEE Access, 8, 150360–150376. https://doi.org/10.1109/ACCESS.2020.3016715

Feng, Y., Spezia, M., Huang, S., Yuan, C., Zeng, Z., Zhang, L., Ji, X., Liu, W., Huang, B., Luo, W., Liu, B., Lei, Y., Du, S., Vuppalapati, A., Luu, H. H., Haydon, R. C., He, T. C., & Ren, G. (2018). Breast cancer development and progression: Risk factors, cancer stem cells, signaling pathways, genomics, and molecular pathogenesis. Genes and Diseases, 5(2), 77–106. https://doi.org/10.1016/J.GENDIS.2018.05.001

Gupta, K., & Janghel, R. R. (2019). Dimensionality reduction-based breast cancer classification using machine learning. Advances in Intelligent Systems and Computing, 798, 133–146. https://doi.org/10.1007/978-981-13-1132-1_11/COVER

Harinishree, M. S., Aditya, C. R., & Sachin, D. N. (2021). Detection of Breast Cancer using Machine Learning Algorithms - A Survey. Proceedings - 5th International Conference on Computing Methodologies and Communication, ICCMC 2021, 1598–1601. https://doi.org/10.1109/ICCMC51019.2021.9418488

He, X., Liu, X., Zuo, F., Shi, H., & Jing, J. (2023). Artificial intelligence-based multi-omics analysis fuels cancer precision medicine. Seminars in Cancer Biology, 88, 187–200. https://doi.org/10.1016/J.SEMCANCER.2022.12.009

Islam, M. M., Haque, M. R., Iqbal, H., Hasan, M. M., Hasan, M., & Kabir, M. N. (2020). Breast Cancer Prediction: A Comparative Study Using Machine Learning Techniques. SN Computer Science, 1(5), 1–14. https://doi.org/10.1007/S42979-020-00305-W/METRICS

Jaiswal, G., Rani, R., Mangotra, H., & Sharma, A. (2023). Integration of hyperspectral imaging and autoencoders: Benefits, applications, hyperparameter tunning and challenges. Computer Science Review, 50, e100584. https://doi.org/10.1016/J.COSREV.2023.100584

Kabir, M. F., Chen, T., & Ludwig, S. A. (2023). A performance analysis of dimensionality reduction algorithms in machine learning models for cancer prediction. Healthcare Analytics, 3, e100125. https://doi.org/10.1016/J.HEALTH.2022.100125

Kabiraj, S., Raihan, M., Alvi, N., Afrin, M., Akter, L., Sohagi, S. A., & Podder, E. (2020). Breast Cancer Risk Prediction using XGBoost and Random Forest Algorithm. 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 1–4. https://doi.org/10.1109/ICCCNT49239.2020.9225451

Karuppasamy, A. D., Abdesselam, A., zidoum, H., Hedjam, R., & Al-Bahri, M. (2024). Combining a forward supervised filter learning with a sparse NMF for breast cancer histopathological image classification. Intelligence-Based Medicine, 10, e100174. https://doi.org/10.1016/J.IBMED.2024.100174

Li, X., Chen, X., & Rezaeipanah, A. (2023). Automatic breast cancer diagnosis based on hybrid dimensionality reduction technique and ensemble classification. Journal of Cancer Research and Clinical Oncology, 149(10), 7609–7627. https://doi.org/10.1007/S00432-023-04699-X

Maharana, K., Mondal, S., & Nemade, B. (2022). A review: Data pre-processing and data augmentation techniques. Global Transitions Proceedings, 3(1), 91–99. https://doi.org/10.1016/J.GLTP.2022.04.020

Mishra, V., & Rath, S. K. (2021). Detection of breast cancer tumours based on feature reduction and classification of thermograms. Quantitative InfraRed Thermography Journal, 18(5), 300–313. https://doi.org/10.1080/17686733.2020.1768497

Mishra, V., Rath, S., & Rath, S. K. (2023). Feature Analysis for Detection of Breast Cancer Thermograms Using Dimensionality Reduction Techniques (pp. 311–321). https://doi.org/10.1007/978-981-19-9090-8_27

Naji, M. A., Filali, S. El, Aarika, K., Benlahmar, E. H., Abdelouhahid, R. A., & Debauche, O. (2021). Machine Learning Algorithms for Breast Cancer Prediction and Diagnosis. Procedia Computer Science, 191, 487–492. https://doi.org/10.1016/j.procs.2021.07.062

Paul, A., Joe, M., Brian, M., Hussein, M., & Rosette, K. (2024). Exploring Dimensionality Reduction Techniques for Improved Breast Cancer Diagnosis. International Journal of Research and Scientific Innovation, 11(5), 808–824. https://ideas.repec.org/a/bjc/journl/v11y2024i5p808-824.html

Pérez-Núñez, J. R., Rodríguez, C., Vásquez-Serpa, L. J., & Navarro, C. (2024). The Challenge of Deep Learning for the Prevention and Automatic Diagnosis of Breast Cancer: A Systematic Review. Diagnostics, 14(24), e2896. https://doi.org/10.3390/DIAGNOSTICS14242896/S1

Rimi, I., Mondal, Md. N. I., & Oishy, J. (2022). Prediction Approach of Breast Cancer using Dimensionality Reduction and Outlier Detection. 2022 4th International Conference on Electrical, Computer and Telecommunication Engineering (ICECTE), 1–4. https://doi.org/10.1109/ICECTE57896.2022.10114477

Sengar, P. P., Gaikwad, M. J., & Nagdive, A. S. (2020). Comparative Study of Machine Learning Algorithms for Breast Cancer Prediction. 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), 796–801. https://doi.org/10.1109/ICSSIT48917.2020.9214267

Tai, C. en A., Gunraj, H., Hodzic, N., Flanagan, N., Sabri, A., & Wong, A. (2024). Enhancing Clinical Support for Breast Cancer with Deep Learning Models Using Synthetic Correlated Diffusion Imaging. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 14313 LNCS, 83–93. https://doi.org/10.1007/978-3-031-47076-9_9/COVER

Dwarakanath, G. V., & Bhairagond, M. (2023). Breast Cancer Detection and Prediction using Image Processing and ML. International Journal for Science Technology and Engineering, 11(8), 1902–1911. https://doi.org/10.22214/IJRASET.2023.55503

Yu, S. M., Young, C. Y. M., Chan, Y. H., Chan, Y. S., Tsoi, C., Choi, M. N. Y., Chan, T. H., Leung, J., Chu, W. C. W., Hung, E. H. Y., & Chau, H. H. L. (2024). Artificial intelligence–based computer-aided diagnosis for breast cancer detection on digital mammography in Hong Kong. Hong Kong Medical Journal, 30(6), 468–477. https://doi.org/10.12809/HKMJ2310920
Ansari, Z. A., Arif, M., Rajaboina, N. B., Shaikh, A. A., & Singh, Y. (2025). Integrated Ensemble Strategy for Breast Cancer Detection Using Dimensionality Reduction Technique. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 14, e31899. https://doi.org/10.14201/adcaij.31899

Downloads

Download data is not yet available.
+