Educational Data Science and Machine Learning: A Case Study on University Student Dropout in Mexico


Nowadays, university dropout is a disturbing phenomenon that affects students, educational institutions, and the state. A look at this phenomenon from Educational Data Science and the application of Machine Learning techniques allows us to search for the potential permanence of the students, which is why this research aims to predict school dropout in the first year of studies. university level using these techniques. A practical case study is analyzed in the educational field using a private university student database in Mexico. It is shown in the study that the metrics and the visualization of the structuring to analyze patterns allow to determine that the characteristics that best predict institutional dropout in the first year of studies at the university level are the average of the student in the first period and the percentage of the scholarship.
  • Referencias
  • Cómo citar
  • Del mismo autor
  • Métricas
Aljohani, O. (2016). A Review of the Contemporary International Literature on Student Retention in Higher Education. International Journal of Education and Literacy Studies, 4(1), 40-52.
Álvarez Álvarez, C., & San Fabián Maroto, J. L. (2015). La elección del estudio de caso en investigación educativa. Gazeta de Antropología, 28(1).
Aulck, L., Velagapudi, N., Blumenstock, J., & West, J. (2017). Predicting student dropout in higher education. arXiv, 1606.06364v4.
Balanta Viera, V., & Palacios Medina, M. (2018). Machine Learning: Redes Neuronales Artificiales. Independently Published.
Barros, R., Basgalupp, M., Carvalho, A., & Freitas, A. (2012). A Survey of Evolutionary Algorithms for Decision-Tree Induction. IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews, 42(3), 291-312.
Batista, G. E. A. P. A. & Monard, M. C. (2010). An Analysis of Four Missing Data Treatment Methods for Supervised Learning. Applied Artificial Intelligence, 17(5-6), 519-533.
Bobadilla, J. (2021). Machine Learning y Deep Learning: Usando Python, Scikit y Keras. Ediciones de la U.
Burke, A. (2019). Student retention models in higher education: A literature review. College and University, 94(2), 12-21.
Bonaldo, L., & Pereira, L. N. (2016). Dropout: Demographic profile of Brazilian university students. Procedia-Social and behavioral sciences, 228, 138-143.
Cabero‐Almenara, J., Guillén‐Gámez, F. D., Ruiz‐Palmero, J., & Palacios‐Rodríguez, A. (2022). Teachers’ digital competence to assist students with functional diversity: Identification of factors through logistic regression methods. British Journal of Educational Technology, 53(1), 41-57.
Caicedo Bravo, E. F., & López Sotelo, J. A. (2009). Una aproximación práctica a las redes neuronales artificiales. Programa Editorial Univalle.
Cardona, T., Cudney, E. A., Hoerl, R., & Snyder, J. (2023). Data Mining and Machine Learning Retention Models in Higher Education. Journal of College Student Retention: Research, Theory& Practice, 25(1), 51-75.
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ASM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). ACM Press.
Donoso, S., & Schiefelbein, E. (2007). Análisis de los modelos explicativos de retención de estudiantes en la universidad: una visión desde la desigualdad social. Estudios Pedagógicos, 33(1), 7-27.
Espinosa-Zúñiga, J. J. (2020). Aplicación de algoritmos Random Forest y XGBoost en una base de solicitudes de tarjetas de crédito. Ingeniería, Investigación y Tecnología, 21(3).
Ferreira, S. A. & Andrade, A. (2016) Academic analytics: Anatomy of an exploratory essay. Education and Information Technology, 21, 229–243
Ferreyra, M. M., Avitabile, C., Botero Álvarez, J., Haimovich Paz, F., & Urzúa, S. (2017). At a crossroads: higher education in Latin America and the Caribbean. World Bank Publications.
Fike, D. S., & Fike, R. (2008). Predictors of first-year student retention in the community college. Community College Review, 36(2), 68-88
Fortin, N. M. (2006). Higher-education policies and the college wage premium: Cross-state evidence from the 1990s. The American Economic Review, 96(4), 959-987.
Gitto, L., Minervini, L. F., & Monaco, L. (2016). University dropouts in Italy: Are supply side characteristics part of the problem? Economic Analysis and Policy, 49, 108-116.
González Fiegehen, L. E., & Espinoza Díaz, O. (2020). Deserción en educación superior en América Latina y el Caribe. Paideia, (45), 33-46.
Hernández, C., & Rodríguez, J. (2008). Preprocesamiento de datos estructurados. Revista Vínculos, 4 (2) 27-48.
Herzog, S. (2005). Measuring determinants of student return vs. dropout/stopout vs. transfer: A first-to-second year analysis of new freshmen. Research in Higher Education, 46(8), 883-928.
Huo, H., Cui, J., Hein, S., Padgett, Z., Ossolinski, M., Raim, R., & Zhang, J. (2023). Predicting Dropout for Nontraditional Undergraduate Students: A Machine Learning Approach. Journal of College Student Retention: Research, Theory & Practice, 24(4), 1054–1077.
INEE. (2019). Principales cifras nacionales. Educación básica y media superior. Inicio del ciclo escolar 2016-2017. INNE.
INEE. (2022). Directrices para mejorar la permanencia escolar en la educación media superior. INNE.
Kirk, G. (2018). Retention in a Bachelor of Education (Early childhood studies) course: students say why they stay and others leave. Higher Education Research & Development, 37(4), 773-787.
Luckman, M., & Harvey, A. (2019). The financial and educational outcomes of Bachelor degree non-completers. Journal of Higher Education Policy and Management, 41(1), 3-17.
Márquez‐Vera, C., Cano, A., Romero, C., Noaman, A. Y. M., Mousa Fardoun, H., & Ventura, S. (2016). Early dropout prediction using data mining: a case study with high school students. Expert Systems, 33(1), 107-124.
Melguizo, T., Torres, F. S., & Jaime, H. (2011). The association between financial aid availability and the college dropout rates in Colombia. Higher Education, 62(2), 231-247.
Morales Salas, R. E., & Rodríguez Pavón, P. R. (2022). Retos y desafíos en la Educación Superior: una mirada desde la percepción de los docentes. Education in the Knowledge Society, 23, e264020.
Nair, R., & Bhagat, A. (2019). Feature Selection Method to improve the accuracy of classification algorithm. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 8(6) 124-127.
Nieuwoudt, J. E., & Pedler, M. L. (2021). Student Retention in Higher Education: Why Students Choose to Remain at University. Journal of College Student Retention: Research, Theory & Practice.
Peláez, I. M. (2016). Modelos de regresión: lineal simple y regresión logística. Revista Seden, 14, 195-214.
Rojas-López, A. (2017). Intervención de tres estrategias educativas para cursos de programación en educación superior Education in the Knowledge Society, 8(4), 21-34.
Santos, A. C., Iglesias Rodríguez, A., & Pinto-Llorente, A. M. (2020). Identification of characteristics and functionalities for the design of an academic analytics model for Higher Education. Proceeding of the TEEM'20: Eighth International Conference on Technological Ecosystems for Enhancing Multiculturality (pp. 997-1003). ACM Press.
Simpson, O. (2005). The costs and benefits of student retention for students, institutions, and governments. Studies in Learning, Evaluation Innovation and Development, 2(3), 34-43.
Stoops, N. (2004). Educational attainment in the United States: 2003. Current population.
Tinto, V. (1982). Limits of Theory and Practice in Student Attrition. The Journal of Higher Education, 53(6), 687-700.
Von Hippel, P. T., & Hofflinger, A. (2021). The data revolution comes to higher education: identifying students at risk of dropout in Chile. Journal of Higher Education Policy and Management, 43(1), 2-23.
Kuz, A., & Morales, R. (2023). Educational Data Science and Machine Learning: A Case Study on University Student Dropout in Mexico. Education in The Knowledge Society, 24, e30080.


Download data is not yet available.