Machine Learning techniques and Polygenic Risk Score application to prediction genetic diseases


For the last 10 years and after important discoveries such as genomic understanding of the human being, there has been a considerable increase in the interest on research risk prediction models associated with genetic originated diseases through two principal approaches: Polygenic Risk Score and Machine Learning techniques. The aim of this work is the narrative review of the literature on Machine Learning techniques applied to obtaining the polygenic risk score, highlighting the most relevant research and applications at present. The application of these techniques has provided many benefits in the prediction of diseases, it is evident that the challenges of the use and optimization of these two approaches are still being discussed and investigated in order to have a greater precision in the prediction of genetic diseases.
  • Referencias
  • Cómo citar
  • Del mismo autor
  • Métricas
Amin, N., van Duijn, C. M., & Janssens, A. C. 2009. Genetic scoring analysis: a way forward in genome wide association studies. European journal of epidemiology, 24(10), 585–587. Springer.

Antonucci L, Pergola G, Dwyer D, Torretta S, Romano R, …, et al. 2019 Classification of Schizophrenia Using Machine Learning with Multimodal Markers. Biological Psychiatry, Elsevier, Vol. 85, p. S107.

Altman N, 1992. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician 46: 175–185.

Breiman L, 2001. Random forests. Machine learning 45: 5–32.

Cao, H., Meyer-Lindenberg, A., & Schwarz, E. 2018. Comparative Evaluation of Machine Learning Strategies for Analyzing Big Data in Psychiatry. International journal of molecular sciences, 19(11), 3387.

Choi SW, and O’Reilly PF. 2019. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8. PRSice

Cortes C, Vapnik V, 1995. Support-vector networks. Machine learning 20: 273–297.

Doan, N. T., Kaufmann, T., Bettella, F., Jørgensen, K. N., Brandt, C. L., Moberget, T., Alnæs, D., Douaud, G., Duff, E., Djurovic, S., Melle, I., Ueland, T., Agartz, I., Andreassen, O. A., & Westlye, L. T. 2017. Distinct multivariate brain morphological patterns and their added predictive value with cognitive and polygenic risk scores in mental disorders. NeuroImage. Clinical, Elsevier, Vol. 15, pages 719–731.

Euesden, J., Lewis, C. M., & O’Reilly, P. F. 2015. PRSice: Polygenic Risk Score software. Bioinformatics (Oxford, England), 31(9), pages 1466–1468.

Ferlay J, Soerjomataram I, Ervik M, Dikshit R, Eser S, 2013. Cancer Incidence and Mortality World GLOBOCAN 2012 v1.0, wide: IARC Cancer Base. International Agency for Research on Cancer: Lyon, France.

Gao C, Sun H, Wang T, Tang M, Bohnen Nl, et al. 2018. Model-based and Model-free Machine Learning Techniques for Diagnostic Prediction and Classification of Clinical Outcomes in Parkinson’s Disease. Scientific Reports, 8(1): 7129.

Griffiths, T., Baker, E., Schmidt, K. M., Bracher-Smith, M., Walters, J., Artemiou, A., … Escott-Price, V. 2019. Predictive modeling of schizophrenia from genomic data: Comparison of polygenic risk score with kernel support vector machines approach. American journal of medical genetics. 180(1): pages 80–85.

Ho, D., Schierding, W., Wake, M., Saffery, R., & O’Sullivan, 2019. Machine Learning SNP Based Prediction for Precision Medicine. Frontiers in genetics, 10: 267.

Jordan MI, Mitchell TM., 2015. Machine learning: Trends, perspectives, and prospects. Science, 349(6245): pages 255-60.

Kristy A. Carpenter, Xudong Huang, 2018. Machine Learning-based Virtual Screening and Its Applications to Alzheimer’s Drug Discovery: A Review. A Review. Current pharmaceutical design, 24(28): pages 3347–3358.

Kuchenbaecker, K. B., McGuffog, L., Barrowdale, D., Lee, A., Soucy, P., Dennis, J., Domchek, S. M., Robson, M., Spurdle, A. B., Ramus, S. J., Mavaddat, N., Terry, M. B., Neuhausen, S. L., Schmutzler, R. K., Simard, J., Pharoah, P., Offit, K., Couch, F. J., Chenevix-Trench, G., Easton, D. F., … Antoniou, A. C. 2017. Evaluation of Polygenic Risk Scores for Breast and Ovarian Cancer Risk Prediction in BRCA1 and BRCA2 Mutation Carriers. Journal of the National Cancer Institute, 109(7): djw302.

Leung, M. K., Xiong, H. Y., Lee, L. J., & Frey, B. J. 2014. Deep learning of the tissue-regulated splicing code. Bioinformatics (Oxford, England), 30(12): i121–i129.

Levine, M. E., Langfelder, P., & Horvath, S. 2017. A Weighted SNP Correlation Network Method for Estimating Polygenic Risk Scores. Methods in molecular biology (Clifton, N.J.), 1613: pages 277–290.

McKinney, B. A., Reif, D. M., Ritchie, M. D., & Moore, J. H. 2006. Machine learning for detecting gene-gene interactions: a review. Applied bioinformatics, 5(2): pages 77–88.

Paré G, Mao S, Deng W Q, 2017. A machine-learning heuristic to improve gene score prediction of polygenic traits, Scientific reports, 7(1): 12665.

Pisanu, C., & Squassina, A. 2019. Treatment-Resistant Schizophrenia: Insights from Genetic Studies and Machine Learning Approaches. Frontiers in pharmacology, 10: 617.

Ranlund S, Joao M, Jong S, James H, Kyriakopoulos M, Cynthia H, Mitul A, Dima D. 2018. Associations between polygenic risk scores for four psychiatric illnesses and brain structure using multivariate pattern recognition. Neuroimage Clinical, Elsevier, Vol 20, pages 1026-1036.

Reisberg, S., Iljasenko, T., Läll, K., Fischer, K., & Vilo, J. 2017. Comparing distributions of polygenic risk scores of type 2 diabetes and coronary heart disease within different populations. PloS one, 12(7): e0179238.

Ripke S, Baker E, Escott V, et al. 2019. T22INVESTIGATION OF PATHWAY-BASED POLYGENIC RISK SCORES USING MACHINE LEARNING PREDICTION AND CLASSIFICATION SCHEMES. European Neuropsychopharmacology, Vol 29, Supplement 5, pages S229-S230.

Shapire, R. E. & Freund, Y. 2012. Boosting: Foundations and algorithms MIT Press, Cambridge (2012)

Shieh, Y., Hu, D., Ma, L., Huntsman, S., Gard, C. C., Leung, J., Tice, J. A., Ziv, E., Kerlikowske, K., & Cummings, S. R. 2017. Joint relative risks for estrogen receptor-positive breast cancer from a clinical model, polygenic risk score, and sex hormones. Breast cancer research and treatment, 166(2): pages 603–612.

Telenti, A., Lippert, C., Chang, P. C., & DePristo, M. 2018. Deep learning of genomic variation and regulatory network data. Human molecular genetics, 27(R1): R63–R71.

Torkamani A., Topol E., 2019. Polygenic Risk Scores Expand to Obesity. Cell, Vol 177, Issue 3, pages 518-520.

Vamathevan, J., Clark, D., Czodrowski, P., Dunham, I., Ferran, E., Lee, G., … Zhao, S. 2019. Applications of machine learning in drug discovery and development. Nature reviews. Drug discovery, 18(6): pages 463–477.

World Health Organization, 2018. Genes and noncommunicable diseases. Genes and human diseases.

Wray, N. R., Goddard, M. E., & Visscher, P. M. 2007. Prediction of individual genetic risk to disease from genome-wide association studies. Genome research, 17(10): pages 1520–1528.

Wu, Q., Boueiz, A., Bozkurt, A., Masoomi, A., Wang, A., DeMeo, D. L., Qiu, W. 2018. Deep Learning Methods for Predicting Disease Status Using Genomic Data. Journal of biometrics & biostatistics, 9(5): 417.

Xiong, H. Y., Alipanahi, B., Lee, L. J., Bretschneider, H., Merico, D., Yuen, R. K., Hua, Y., Gueroussov, S., Najafabadi, H. S., Hughes, T. R., Morris, Q., Barash, Y., Krainer, A. R., Jojic, N., Scherer, S. W., Blencowe, B. J., & Frey, B. J. 2015. RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease. Science (New York, N.Y.), 347(6218): 1254806.

Zhang YD, Wang J, Wu CJ, Bao ML, Li H, et al. 2016. An imaging-based approach predicts clinical outcomes in prostate cancer through a novel support vector machine classification. Oncotarget, Vol. 7(47): pages 78140–78151.

Zhou, J., & Troyanskaya, O. G. 2015. Predicting effects of noncoding variants with deep learning-based sequence model. Nature methods, 12(10): pages 931–934.
Mena Mamani, N. (2020). Machine Learning techniques and Polygenic Risk Score application to prediction genetic diseases. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 9(1), 5–14.


Download data is not yet available.

Author Biography

Nibeth Mena Mamani

University of Salamanca
Computer and Automation Department. Software Engineer in Computer Systems