An approach for discovering keywords from Spanish tweets using Wikipedia
Resumen Most approaches to keywords discovery when analyzing microblogging messages (among them those from Twitter) are based on statistical and lexical information about the words that compose the text. The lack of context in the short messages can be problematic due to the low co-occurrence of words. In this paper, we present a new approach for keywords discovering from Spanish tweets based on the addition of context information using Wikipedia as a knowledge base. We present four different ways to use Wikipedia and two ways to rank the new keywords. We have tested these strategies using more than 60000 Spanish tweets, measuring performance and analyzing particularities of each strategy.
- Referencias
- Cómo citar
- Del mismo autor
- Métricas
Blei, D. M., 2012. Probabilistic topic models. Communications of the ACM, 55(4):77–84.
http://dx.doi.org/10.1145/2133806.2133826
Chen, Y., Li, Z., Nie, L., Hu, X., Wang, X., Chua, T.-s., and Zhang, X., 2012. A Semi-Supervised Bayesian Network Model for Microblog Topic Classification. In COLING, pages 561–576.
Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., and Ghosh, R., 2013. Discovering coherent topics
http://dx.doi.org/10.1145/2505515.2505519
Dubhashi, D. P. and Panconesi, A., 2009. Concentration of measure for the analysis of randomized algorithms. Cambridge University Press.
http://dx.doi.org/10.1017/CBO9780511581274
Hennig-Thurau, T., Wiertz, C., and Feldhaus, F., 2014. Does Twitter matter? The impact of microblogging word of mouth on consumers' adoption of new movies. Journal of the Academy of Marketing Science, 43(3):375–394.
http://dx.doi.org/10.1007/s11747-014-0388-3
Hu, X. and Liu, H., 2012. Text analytics in social media. In Mining text data, pages 385–414. Springer.
http://dx.doi.org/10.1007/978-1-4614-3223-4_12
Hulpus, I., Hayes, C., Karnstedt, M., and Greene, D., 2013. Unsupervised graph-based topic labelling using dbpedia. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 465–474. ACM.
http://dx.doi.org/10.1145/2433396.2433454
Ko, Y., 2012. A study of term weighting schemes using class information for text classification. In Proceedings
http://dx.doi.org/10.1145/2348283.2348453
Nenkova, A. and McKeown, K., 2012. A survey of text summarization techniques. In Mining Text Data, pages 43–76. Springer.
http://dx.doi.org/10.1007/978-1-4614-3223-4_3
Ren, F. and Sohrab, M. G., 2013. Class-indexing-based term weighting for automatic text classification. Information Sciences, 236:109–125.
http://dx.doi.org/10.1016/j.ins.2013.02.029
Thorleuchter, D. and Van den Poel, D., 2012. Improved multilevel security with latent semantic indexing. Expert Systems with Applications, 39(18):13462–13471.
http://dx.doi.org/10.1016/j.eswa.2012.06.002
Thorleuchter, D. and Van den Poel, D., 2013. Technology classification with latent semantic indexing. Expert
Systems with Applications, 40(5):1786–1795.
Xie, J., Emenheiser, J., Kirby, M., Sreenivasan, S., Szymanski, B., Holme, P. et al., 2012. Evolution of
Yubo Chen, S. F. and Wang, Q., 2011. The Role of Marketing in Social Media: How Online Consumer Reviews Evolve. Journal of Interactive Marketing, 25(2):85–94.
http://dx.doi.org/10.1016/j.intmar.2011.01.003
Zhang, W., Yoshida, T., and Tang, X., 2011. A comparative study of TF* IDF, LSI and multi-words for text classification. Expert Systems with Applications, 38(3):2758–2765.
http://dx.doi.org/10.1016/j.eswa.2010.08.066
Zhu, J., Chen, N., Perkins, H., and Zhang, B., 2014. Gibbs max-margin topic models with data augmentation.
http://dx.doi.org/10.1145/2133806.2133826
Chen, Y., Li, Z., Nie, L., Hu, X., Wang, X., Chua, T.-s., and Zhang, X., 2012. A Semi-Supervised Bayesian Network Model for Microblog Topic Classification. In COLING, pages 561–576.
Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., and Ghosh, R., 2013. Discovering coherent topics
http://dx.doi.org/10.1145/2505515.2505519
Dubhashi, D. P. and Panconesi, A., 2009. Concentration of measure for the analysis of randomized algorithms. Cambridge University Press.
http://dx.doi.org/10.1017/CBO9780511581274
Hennig-Thurau, T., Wiertz, C., and Feldhaus, F., 2014. Does Twitter matter? The impact of microblogging word of mouth on consumers' adoption of new movies. Journal of the Academy of Marketing Science, 43(3):375–394.
http://dx.doi.org/10.1007/s11747-014-0388-3
Hu, X. and Liu, H., 2012. Text analytics in social media. In Mining text data, pages 385–414. Springer.
http://dx.doi.org/10.1007/978-1-4614-3223-4_12
Hulpus, I., Hayes, C., Karnstedt, M., and Greene, D., 2013. Unsupervised graph-based topic labelling using dbpedia. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 465–474. ACM.
http://dx.doi.org/10.1145/2433396.2433454
Ko, Y., 2012. A study of term weighting schemes using class information for text classification. In Proceedings
http://dx.doi.org/10.1145/2348283.2348453
Nenkova, A. and McKeown, K., 2012. A survey of text summarization techniques. In Mining Text Data, pages 43–76. Springer.
http://dx.doi.org/10.1007/978-1-4614-3223-4_3
Ren, F. and Sohrab, M. G., 2013. Class-indexing-based term weighting for automatic text classification. Information Sciences, 236:109–125.
http://dx.doi.org/10.1016/j.ins.2013.02.029
Thorleuchter, D. and Van den Poel, D., 2012. Improved multilevel security with latent semantic indexing. Expert Systems with Applications, 39(18):13462–13471.
http://dx.doi.org/10.1016/j.eswa.2012.06.002
Thorleuchter, D. and Van den Poel, D., 2013. Technology classification with latent semantic indexing. Expert
Systems with Applications, 40(5):1786–1795.
Xie, J., Emenheiser, J., Kirby, M., Sreenivasan, S., Szymanski, B., Holme, P. et al., 2012. Evolution of
Yubo Chen, S. F. and Wang, Q., 2011. The Role of Marketing in Social Media: How Online Consumer Reviews Evolve. Journal of Interactive Marketing, 25(2):85–94.
http://dx.doi.org/10.1016/j.intmar.2011.01.003
Zhang, W., Yoshida, T., and Tang, X., 2011. A comparative study of TF* IDF, LSI and multi-words for text classification. Expert Systems with Applications, 38(3):2758–2765.
http://dx.doi.org/10.1016/j.eswa.2010.08.066
Zhu, J., Chen, N., Perkins, H., and Zhang, B., 2014. Gibbs max-margin topic models with data augmentation.
Ayala, D., Roldán, J. C., Ruiz, D., & Gallego, F. O. (2015). An approach for discovering keywords from Spanish tweets using Wikipedia. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 4(2), 73–88. https://doi.org/10.14201/ADCAIJ2015427388
Descargas
Los datos de descargas todavía no están disponibles.
+
−