Explainable Machine Learning for Mental Health Detection Using NLP

  • Noor Ul Ain Mushtaq
    Department of Computer System Engineering, Mehran University of Engineering and Technology, Indus Hwy, Jamshoro, Sindh 76062. Department of Biomedical Engineering, Mehran University of Engineering and Technology, Indus Hwy, Jamshoro, Sindh 76062 noorulain.mushtaq[at]admin.muet.edu.pk
  • Sanam Narejo
    Department of Computer System Engineering, Mehran University of Engineering and Technology, Indus Hwy, Jamshoro, Sindh 76062
  • Syed Amjad Ali
    Department of Biomedical Engineering, Mehran University of Engineering and Technology, Indus Hwy, Jamshoro, Sindh 76062
  • Muhammad Moazzam Jawaid
    School of Engineering & Physical Sciences, University of Lincoln, Lincoln LN6 7TS, United Kingdom

Abstract

Humans’ mental conditions are often revealed through their social media activity, facilitated by the anonymity of the internet. Early detection of psy- chiatric issues through these activities can lead to timely interventions, po- tentially preventing severe mental health disorders such as depression and anxiety. However, the complexity of state-of-the-art machine learning (ML) models has led to challenges in interpretability, often resulting in these models being viewed as «black boxes». This paper provides a comprehensive analysis of explainable AI (XAI) within the framework of Natural Language Processing (NLP) and ML. Thus, NLP techniques improve the performance of learning-based methods by incorporating the semantic and syntactic features of the text. The application of ML in healthcare is gaining traction, particularly in extracting novel scientific insights from observational or simulated data. Domain knowledge is crucial for achieving scientific consistency and explainability. In our study, we implemented Naïve Bayes and Random Forest algorithms, achieving accuracies of 92 % and 99 %, respectively. To further explore transparency, interpretability, and explainability, we applied explainable ML techniques, with LIME emerging as a popular tool. Our findings underscore the importance of integrating XAI methods to better understand and interpret the decisions made by complex ML models.
  • Referencias
  • Cómo citar
  • Del mismo autor
  • Métricas
Abdullah, D. M., & Abdulazeez, A. M. (2021). Machine learning applications based on SVM classification: A review. Qubahan Academic Journal, 1(2), 81–90.

Aggarwal, A., Singh, J., & Gupta, D. K. (2018). A review of different text categorization techniques. International Journal of Engineering and Technology (IJET), 7(3.8), 11–15.

Al Hammadi, A. Y., Yeun, C. Y., Damiani, E., Yoo, P. D., Hu, J., Yeun, H. K., & Yim, M. S. (2021). Explainable artificial intelligence to evaluate industrial internal security using EEG signals in IoT framework. Ad Hoc Networks, 123, 102641. https://doi.org/10.1016/j.adhoc.2021.102641

American Psychological Association. (2021, March 11). One year of unhealthy weight gains and increased drinking were reported by Americans coping with pandemic stress [Press release]. http://www.apa.org/news/press/releases/2021/03/one-year-pandemic-stress

Becker, L. A., Penagos, H., Flores, F. J., Manoach, D. S., Wilson, M. A., & Varela, C. (2022). Eszopiclone and zolpidem produce opposite effects on hippocampal ripple density. Frontiers in Pharmacology, 12, 792148. https://doi.org/10.3389/fphar.2021.792148

Bilal, M., Israr, H., Shahid, M., & Khan, A. (2016). Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree, and KNN classification techniques. Journal of King Saud University - Computer and Information Sciences, 28(3), 330–344. https://doi.org/10.1016/j.jksuci.2015.10.003

Brailovskaia, J., Schillack, H., & Margraf, J. (2020). Tell me why are you using social media (SM)! Relationship between reasons for use of SM, SM flow, daily stress, depression, anxiety, and addictive SM use–An exploratory investigation of young adults in Germany. Computers in Human Behavior, 113, 106511. https://doi.org/10.1016/j.chb.2020.106511

Charbuty, B., & Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(1), 20–28. https://doi.org/10.38094/jastt20193

Elbadawi, M., Gaisford, S., & Basit, A. W. (2021). Advanced machine-learning techniques in drug discovery. Drug Discovery Today, 26(3), 769–777. https://doi.org/10.1016/j.drudis.2020.12.005

Fabris, M. A., Marengo, D., Longobardi, C., & Settanni, M. (2020). Investigating the links between fear of missing out, social media addiction, and emotional symptoms in adolescence. Addictive Behaviors, 106, 106364. https://doi.org/10.1016/j.addbeh.2020.106364

Gayathri, K., & Marimuthu, A. (2013, January). Text document pre-processing with the KNN for classification using the SVM. In 2013 7th International Conference on Intelligent Systems and Control (ISCO) (pp. 453–457). IEEE. https://doi.org/10.1109/ISCO.2013.6481162

Guntuku, S. C., Yaden, D. B., Kern, M. L., Ungar, L. H., & Eichstaedt, J. C. (2017). Detecting depression and mental illness on social media: An integrative review. Current Opinion in Behavioral Sciences, 18, 43–49. https://doi.org/10.1016/j.cobeha.2017.07.005

Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2003). KNN model-based approach in classification. In On The Move to Meaningful Internet Systems 2003 (pp. 986–996). Springer. https://doi.org/10.1007/978-3-540-39964-3_62

Hu, C., Li, L., Li, Y., Wang, F., Hu, B., & Peng, Z. (2022). Explainable machine-learning model for prediction of in-hospital mortality in septic patients requiring intensive care unit readmission. Infectious Diseases and Therapy, 11(4), 1695–1713. https://doi.org/10.1007/s40121-022-00508-9

Ibrahim, I., & Abdulazeez, A. (2021). The role of machine learning algorithms for diagnosing diseases. Journal of Applied Science and Technology Trends, 2(1), 10–19.

Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. WSEAS Transactions on Computers, 4(8), 966–974.

Joshi, G., Walambe, R., & Kotecha, K. (2021). A review on explainability in multimodal deep neural nets. IEEE Access, 9, 59800–59821. https://doi.org/10.1109/ACCESS.2021.3070212

Kim, J., Lee, J., Park, E., & Han, J. (2020). A deep learning model for detecting mental illness from user content on social media. Scientific Reports, 10, 11846. https://doi.org/10.1038/s41598-020-68886-0

Kurnia, R., Tangkuman, Y., & Girsang, A. (2020). Classification of user comment using word2vec and SVM classifier. International Journal of Advanced Trends in Computer Science and Engineering, 9(1), 643–648.

Lin, H., Jia, J., Guo, Q., Xue, Y., Li, Q., Huang, J., & Feng, L. (2014, November). User-level psychological stress detection from social media using deep neural network. In Proceedings of the 22nd ACM International Conference on Multimedia (pp. 507–516).

Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. Journal of Machine Learning Research, 2(Feb), 419–444.

Lopez, B., & Sumba, X. (2019). IMDb sentiment analysis.

Luhn, H. P. (1957). A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1(4), 309–317.

Luo, X. (2021). Efficient English text classification using selected machine learning techniques. Alexandria Engineering Journal, 60(3), 3401–3409. https://doi.org/10.1016/j.aej.2020.12.036

Matta, D. M., & Saraf, M. K. (2020). Prediction of COVID-19 using machine learning techniques.

Montavon, G., Samek, W., & Müller, K. R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73, 1–15. https://doi.org/10.1016/j.dsp.2017.10.011

Myles, A. J., Feudale, R. N., Liu, Y., Woody, N. A., & Brown, S. D. (2004). An introduction to decision tree modeling. Journal of Chemometrics, 18(6), 275–285. https://doi.org/10.1002/cem.889

Pirina, I., & Çöltekin, Ç. (2018, October). Identifying depression on Reddit: The effect of training data. In Proceedings of the 2018 EMNLP Workshop SMM4H (pp. 9–12).

Praveen, S. V., Ittamalla, R., & Deepak, G. (2021). Analyzing Indian general public’s perspective on anxiety, stress and trauma during Covid-19—a machine learning study of 840,000 tweets. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 15(3), 667–671. https://doi.org/10.1016/j.dsx.2021.03.021

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135–1144). https://doi.org/10.1145/2939672.2939778

Rude, S., Gortner, E. M., & Pennebaker, J. (2004). Language use of depressed and depression-vulnerable college students. Cognition & Emotion, 18(8), 1121–1133. https://doi.org/10.1080/02699930441000030

Samek, W., & Müller, K. R. (2019). Towards explainable artificial intelligence. In W. Samek, G. Montavon, A. Vedaldi, L. K. Hansen, & K. R. Müller (Eds.), Explainable AI: Interpreting, explaining and visualizing deep learning (Lecture Notes in Computer Science, Vol. 11700, pp. 5–22). Springer. https://doi.org/10.1007/978-3-030-28954-6_1

Sekulić, I., & Strube, M. (2020). Adapting deep learning methods for mental health prediction on social media. arXiv preprint arXiv:2003.07634. https://doi.org/10.48550/arXiv.2003.07634

Sinha, P., & Sinha, P. (2015). Comparative study of chronic kidney disease prediction using KNN and SVM. International Journal of Engineering Research, 4(12), 608–612.

Slijepcevic, D., Horst, F., Lapuschkin, S., Horsak, B., Raberger, A. M., Kranzl, A., & Zeppelzauer, M. (2021). Explaining machine learning models for clinical gait analysis. ACM Transactions on Computing for Healthcare, 3(2), 1–27. https://doi.org/10.1145/3442387

Stirman, S. W., & Pennebaker, J. W. (2001). Word use in the poetry of suicidal and nonsuicidal poets. Psychosomatic Medicine, 63(4), 517–522. https://doi.org/10.1097/00006842-200107000-00001

Usman, M., Shafique, Z., Ayub, S., & Malik, K. (2016). Urdu text classification using majority voting. International Journal of Advanced Computer Science and Applications, 7(8). https://doi.org/10.14569/IJACSA.2016.070823

Uysal, E., & Öztürk, A. (2018, May). Comparison of machine learning algorithms on different datasets. In 2018 26th Signal Processing and Communications Applications Conference (SIU) (pp. 1–4). IEEE. https://doi.org/10.1109/SIU.2018.8404331

Von Rueden, L., Mayer, S., Beckh, K., Georgiev, B., Giesselbach, S., Heese, R., & Schuecker, J. (2021). Informed machine learning–a taxonomy and survey of integrating prior knowledge into learning systems. IEEE Transactions on Knowledge and Data Engineering, 35(1), 614–633. https://doi.org/10.1109/TKDE.2021.3079836

Weld, D. S., & Bansal, G. (2019). The challenge of crafting intelligible intelligence. Communications of the ACM, 62(6), 70–79. https://doi.org/10.1145/3282486

Yong, Z., Youwen, L., & Shixiong, X. (2009). An improved KNN text classification algorithm based on clustering. Journal of Computers, 4(3), 230–237. https://doi.org/10.4304/jcp.4.3.230-237

Zheng, Y. (2019, November). An exploration on text classification with classical machine learning algorithm. In 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI) (pp. 81–85). IEEE. https://doi.org/10.1109/MLBDBI48998.2019.00021

Zinken, J., Zinken, K., Wilson, J. C., Butler, L., & Skinner, T. (2010). Analysis of syntax and word use to predict successful participation in guided self-help for anxiety and depression. Psychiatry Research, 179(2), 181–186. https://doi.org/10.1016/j.psychres.2009.06.010
Mushtaq, N. U. A., Narejo, S., Amjad Ali, S., & Moazzam Jawaid, M. (2025). Explainable Machine Learning for Mental Health Detection Using NLP. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 14, e32449. https://doi.org/10.14201/adcaij.32449

Downloads

Download data is not yet available.
+