Sentiment Analysis with Machine Learning Methods on Social Media

Abstract

Social media has become an important part of our everyday life due to the widespread use of the Internet. Of the social media services, Twitter is among the most used ones around the world. People share their opinions by writing tweets about numerous subjects, such as politics, sports, economy, etc. Millions of tweets per day create a huge dataset, which drew attention of the data scientists to focus on these data for sentiment analysis. The sentiment analysis focuses to identify the social media posts of users about a specific topic and categorize them as positive, negative or neutral. Thus, the study aims to investigate the effect of types of text representation on the performance of sentiment analysis. In this study, two datasets were used in the experiments. The first one is the user reviews about movies from the IMDB, which has been labeled by Kotzias, and the second one is the Twitter tweets, including the tweets of users about health topic in English in 2019, collected using the Twitter API. The Python programming language was used in the study both for implementing the classification models using the Naïve Bayes (NB), Support Vector Machines (SVM) and Artificial Neural Networks (ANN) algorithms, and for categorizing the sentiments as positive, negative and neutral. The feature extraction from the dataset was performed using Term Frequency-Inverse Document Frequency (TF-IDF) and Word2Vec (W2V) modeling techniques. The success percentages of the classification algorithms were compared at the end. According to the experimental results, Artificial Neural Network had the best accuracy performance in both datasets compared to the others.
  • Referencias
  • Cómo citar
  • Del mismo autor
  • Métricas
Amolik, A., Jivane, N., Bhandari, M., and Venkatesan, M., 2016. Twitter sentiment analysis of movie reviews using machine learning techniques. International Journal of Engineering and Technology, 7(6): 1-7.

Elghazaly, T. Mahmoud, A. Hefny, H. A., 2016. Political sentiment analysis using twitter data. In: Proceedings of the International Conference on Internet of things and Cloud Computin,1-5.

Elmas, Ç., 2003. Yapay Sinir A?lar? (Kuram, Mimari, E?itim, Uygulama). Ankara: Seçkin Yay?nc?l?k.

Harrington, P.,2012. Machine learning in action. Shelter Island, NY: Manning Publications Co.

Hamoud, A. A., Alwehaibi, A., Roy, K., and Bikdash, M. 2018. Classifying political tweets using Naïve Bayes and support vector machines. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems(736-744). Springer, Cham.

Huq, M. R., Ali, A., and Rahman, A., 2017. Sentiment analysis on Twitter data using KNN and SVM. (IJACSA) International Journal of Advanced Computer Science and Applications, 8(6): 19-25.

Kayikci, S., Akyazi, E., 2018. Classification of Open Directory Web Pages Using Artificial Neural Networks. International Journal of Scientific and Technological Research, 2422-8702

Kaynar, O., Görmez, Y., Y?ld?z, M., and Albayrak, A., 2016 .Makine ö?renmesi yöntemleri ile Duygu Analizi. In International Artificial Intelligence and Data Processing Symposium (IDAP’16), 17-18.

Kotzias, D., Denil, M., De Freitas, N., and Smyth, P. 2015. From group to individual labels using deep features. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 597-606.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J., 2013. Distributed compositionality. Advances in Neural Information Processing Systems. 26: 3111-3119.

Nikfarjam, A, Sarker, A, O’Connor, K, Ginn, R, and Gonzalez, G., 2015. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features, Journal of the American Medical Informatics Association, 22(3): 671-681

Nizam, H, Ak?n, S. S.,2014. Sosyal medyada makine ö?renmesi ile duygu analizinde dengeli ve dengesiz veri setlerinin performanslar?n?n kar??la?t?r?lmas?. XIX. Türkiye’de ?nternet Konferans?.

Pang, B., Lee, L., and Vaithyanathan, S. 2002. Thumbs up? Sentiment classification using machine learning techniques. arXiv preprint cs/0205070.

Rana, S. and Singh, A.,2016. Comparative analysis of sentiment orientation using SVM and Naïve Bayes techniques, 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), Dehradun, pages 106-111, doi: 10.1109/NGCT.2016.7877399.

Rogers, R., 2014. Debanalising Twitter. Twitter and Society, New York, NY, ix-xxxviii.

Sjögren, R., Stridh, K., Skotare, T., and Trygg, J., 2020. Multivariate patent analysis–Using chemometrics to analyze collections of chemical and pharmaceutical patents. Journal of Chemometrics, 34(1): e3041.

Song, O., Hu, W., and Xie, W, 2002. Robust Support Vector Machine with Bullet Hole Image Classification, IEEE Transactions on Systems, Man and Cybernetics – Part C: Applications and Rewiews, 32(4): 440-448.

Symeonidis S, Effrosynidis D., and Arampatzis A., 2002. A comparative evaluation of pre?processing techniques and their interactions for Twitter sentiment analysis. Expert System Applications, 110:298?310.

Türkmen, A. C. Cemgil, A. T., 2014. Political interest and tendency prediction from microblog data. In: 22nd Signal Processing and Communications Applications Conference (SIU). IEEE, 1327-1330

Wright, G., Rodriguez, A., Li, J., Clark, P. L., Milenkovi?, T., and Emrich, S. J., 2020. Analysis of computational codon usage models and their association with translationally slow codons. PloS one, 15(4): e0232003.

Xiao, C., Xia, W., and Jiang, J., 2020. Stock price forecast based on combined model of ARI-MA-LS-SVM. Neural Computing and Applications, 1-10.
Basarslan, M. S., & Kayaalp, F. (2020). Sentiment Analysis with Machine Learning Methods on Social Media. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 9(3), 5–15. https://doi.org/10.14201/ADCAIJ202093515

Downloads

Download data is not yet available.

Author Biographies

Muhammet Sinan Basarslan

,
Dogus University
He was born in Istanbul in December 1991.  He graduated from Düzce University Computer Engineering Department in 2015. He completed his master's degree in Düzce University Computer Engineering Department in 2017 with the study of Customer Churn Analysis. Here, he is still working on his PhD thesis titled 'Developing a Hybrid Method for Sentiment Analysis in Social Media. He worked in various projects in Düzce Technopark (Unibim Bilisim) and Turkish Telecom between 2016-2018. He has been working as a lecturer in Dogus University Vocational School as of February 2018. Apart from Dogus vocational school, he lecturer in the Kavram vocational school and Dogus University as a lecturer at the undergraduate level. His primary research interests include Artificial Intelligence and web technologies.

Fatih Kayaalp

,
Duzce University
He recevied the BS Degree in Computer Science from Marmara University in 2000, the MS Degree in Computer Science from Sakarya University in 2005, and the PhD Degree in Computer Science from Sakarya University in 2014. Currently he is working as an Assistant Professor in Duzce University. His primary research interests include databases, web technologies, computer networks, wireless sensor networks and mobile computing
+