Treating Colon Cancer Survivability Prediction as a Classification Problem

  • Ana Silva
    Universidade do Minho
  • Tiago Oliveira
  • José Neves
    Universidade do Minho
  • Paulo Novais
    Universidade do Minho


This work presents a survivability prediction model for colon cancer developed with machine learning techniques. Survivability was viewed as a classification task where it was necessary to determine if a patient would survive each of the five years following treatment. The model was based on the SEER dataset which, after preprocessing, consisted of 38,592 records of colon cancer patients. Six features were extracted from a feature selection process in order to construct the model. This model was compared with another one with 18 features indicated by a physician. The results show that the performance of the six-feature model is close to that of the model using 18 features, which indicates that the first may be a good compromise between usability and performance.
  • Referencias
  • Cómo citar
  • Del mismo autor
  • Métricas
Al-Bahrani, R., Agrawal, A., and Choudhary, A., 2013. Colon cancer survival prediction using ensemble data mining on SEER data. In 2013 IEEE International Conference on Big Data, pages 9–16.

Bradley, A. P., 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7):1145–1159. ISSN 00313203.

Breiman, L., 1996. Bagging Predictors. Machine Learning, 24(2):123–140. ISSN 1573-0565. 1018054314350.

Bush, D. M. and Michaelson, J. S., 2009. Derivation : Nodes + PrognosticFactors Equation for Colon Cancer accuracy of the Nodes + PrognosticFactors equation . Technical report.

Carneiro, D., Costa, R., Novais, P., Neves, J., Machado, J., and Neves, J., 2008. Simulating and Monitoring Ambient Assisted Living. In Proceedings of the ESM 2008 - The 22nd annual European Simulation and Modelling Conference, pages 175–182.

Le Havre. Chang, G. J., Hu, C. Y., Eng, C., and et al., 2009. Practical application of a calculator for conditional survival in colon cancer. Journal of Clinical Oncology, 27(35):5938–5943. ISSN 0732183X. 23.1860.

Chawla, N. V., 2005. Data Mining for Imbalanced Datasets: An Overview. In Data Mining and Knowledge Discovery Handbook, pages 853–867. ISBN 9780387254654.{\_}40.

Costa, A., Novais, P., Corchado, J. M., and Neves, J., 2011. Increased performance and better patient attendance in an hospital with the use of smart agendas. Logic Journal of IGPL. doi:10.1093/jigpal/jzr021.

Džeroski, S. and Ženko, B., 2004. Is Combining Classifiers with Stacking Better than Selecting the Best One? Machine Learning, 54(3):255–273. ISSN 1573-0565.

Ferlay, J., Soerjomataram, I., Ervik, M., Dikshit, R., Eser, S., Mathers, C., Rebelo, M., Parkin, D. M., Forman, D., and Bray, F., 2012. GLOBOCAN 2012: Estimated Cancer Incidence, Mortality and Prevalence Worldwide in 2012. Last visited on 27/12/2015.

Freund, Y. and Schapire, R. E., 1997. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci., 55(1):119–139. ISSN 0022-0000. 1504.

Ganganwar, V., 2012. An overview of classification algorithms for imbalanced datasets. Int. J. Emerg. Technol. Adv. Eng, 2(4):42–47. ISSN 2250-2459.

Han, J., Pei, J., and Kamber, M., 2006. Data Mining, Southeast Asia Edition. The Morgan Kaufmann Series in Data Management Systems. Elsevier Science. ISBN 9780080475585.

Kittler, J., 1998. Combining classifiers: A theoretical framework. Pattern Analysis and Applications, 1(1):18–27. ISSN 1433-755X. doi:10.1007/BF01238023.

Klepac, G., Klepac, G., Kopal, R., and Mri, L., 2014. Developing Churn Models Using Data Mining Techniques and Social Network Analysis. IGI Global, Hershey, PA, USA, 1st edition. ISBN 1466662883, 9781466662889.

Kotu, V. and Deshpande, B., 2014. Predictive Analytics and Data Mining: Concepts and Practice with RapidMiner. Elsevier Science. ISBN 9780128016503.

Leon, M. R. C. D. and Jalao, E. R. L., 2014. Prediction Model Framework for Imbalanced Datasets. (c):33–41.

Lima, L., Novais, P., Neves, J., Bulas, C. J., and Costa, R., 2011. Group Decision Making and Quality-of- Information in e-Health Systems. Logic Journal of the IGPL, 19(2):315–332. National Cancer Institute, 2015. Surveillance, Epidemiology and End Results Program. Last visited on 10/01/2015.

Novais, P., Oliveira, T., and Neves, J., 2016. Moving towards a new paradigm of creation, dissemination, and application of computer-interpretable medical knowledge. Progress in Artificial Intelligence, pages 1–7. ISSN 2192-6360.

Oliveira, T., Leão, P., Novais, P., and Neves, J., 2014. Webifying the Computerized Execution of Clinical Practice Guidelines. In Bajo Perez, J., Corchado Rodríguez, J. M., and et al., editors, Trends in Practical Applications of Heterogeneous Multi-Agent Systems. The PAAMS Collection SE - 18, volume 293 of Proceedings. Communications in Computer and Information Science. Springer Berlin Heidelberg. ISBN 9783642184406.

Vachani, C. and Prechtel-Dunphy, E., 2015. All About Rectal Cancer. Last visited on 27/12/2015.

Weiser, M. R., Gönen, M., Chou, J. F., Kattan, M. W., and Schrag, D., 2011. Predicting survival after curative colectomy for cancer: Individualizing colon cancer staging. Journal of Clinical Oncology, 29(36):4796– 4802. ISSN 0732183X.

Wolff, A. C., Hammond, M. E. H., Schwartz, J. N., and et al., 2007. American Society of Clinical Oncology/College of American Pathologists guideline recommendations for human epidermal growth factor receptor 2 testing in breast cancer. Journal of clinical oncology, 25(1):18–43. ISSN 1527-7755.

Yamauchi, M., Lochhead, P., Morikawa, T., Huttenhower, C., Chan, A. T., Giovannucci, E., Fuchs, C. S., and Ogino, S., 2012. Colorectal cancer: a tale of two sides or a continuum? Gut, 61(6):794–797.
Silva, A., Oliveira, T., Neves, J., & Novais, P. (2016). Treating Colon Cancer Survivability Prediction as a Classification Problem. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 5(1), 37–50.


Download data is not yet available.