Consensus-based Approach for Keyword Extraction from Urban Events Collections

  • Ana OLIVEIRA Alves
    Centre of Informatics and Systems, University of Coimbra, Portugal & Polytechnic Institute of Coimbra, Portugal ana[at]dei.uc.pt
  • Bernardete Ribeiro
    Department of Informatics Engineering, University of Coimbra, Portugal

Abstract

Automatic keyword extraction (AKE) from textual sources took a valuable step towards harnessing the problem of efficient scanning of large document collections. Particularly in the context of urban mobility, where the most relevant events in the city are advertised on-line, it becomes difficult to know exactly what is happening in a place.In this paper we tackle this problem by extracting a set of keywords from different kinds of textual sources, focusing on the urban events context. We propose an ensemble of automatic keyword extraction systems KEA (Key-phrase Extraction Algorithm) and KUSCO (Knowledge Unsupervised Search for instantiating Concepts on lightweight Ontologies) and Conditional Random Fields (CRF).Unlike KEA and KUSCO which are well-known tools for automatic keyword extraction, CRF needs further pre-processing. Therefore, a tool for handling AKE from the documents using CRF is developed. The architecture for the AKE ensemble system is designed and efficient integration of component applications is presented in which a consensus between such classifiers is achieved. Finally, we empirically show that our AKE ensemble system significantly succeeds on baseline sources and urban events collections.
  • Referencias
  • Cómo citar
  • Del mismo autor
  • Métricas
Alves, A., Antunes, B., Pereira, F., and Bento, C., 2009. Semantic Enrichment of Plac-es: Ontology Learning from Web, Int. J. Know.-Based Intell. Eng. Syst. 13 (1) 19--30. http://dx.doi.org/10.3233/KES-2009-0170

Brill, E., 1994. Some Advances in Transformation-Based Part of Speech Tagging, in: In Proceedings of the Twelfth National Conference on Articial Intelligence, 722--727.

Escovedo, T., Cruz, A., Koshiyama, A., Melo, R., Vellasco, M., 2014. NEVE++: A Neuro-Evolutionary Unlimited Ensemble for Adaptive Learning, in: Proceedings of the International Joint Conference on Neural Networks, IJCNN '14, Beijing, China, 3331--3338.

Finkel, J., Grenager, T., Manning, C., 2005. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling, in: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL '05, Associa-tion for Computational Linguistics, Stroudsburg, PA, USA, 363--370. http://dx.doi.org/10.3115/1219840.1219885

Grineva, M., Grinev, M., and Lizorkin, D., 2009. Extracting Key Terms from Noisy and Multitheme Documents, in: Proceedings of the 18th International Conference on World Wide Web, WWW '09, ACM, New York, NY, USA, 661--670. http://dx.doi.org/10.1145/1526709.1526798

Hulth, A., 2008. Automatic Keyword Extraction: Combining Machine Learning and Natural Language Processing, VDM Verlag, Saarbrücken, Germany.

Hulth, A, 2003. Improved Automatic Keyword Extraction Given More Linguistic Knowledge, in: Proceedings of the 2003 Conference on EmpiricalMethods in Natural Language Processing, Empirical Methods in NLP, Association for Com-putational Linguistics, Stroudsburg, PA, USA, 216--223. http://dx.doi.org/10.3115/1119355.1119383

Kim, S., Medelyan, O., Kan, M.-Y., and Baldwin, T., 2010. SemEval-2010 Task 5: Au-tomatic Keyphrase Extraction from Scientic Articles, in: Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval '10, Association for Computational Linguistics, Stroudsburg, PA, USA, 21--26.

Krapivin, M., Autaeu, A., Marchese, M., 2009. Large dataset for keyphrases extraction, Tech. Rep. Tech. Report DISI-09-055, University of Trento, Italy.

Kuncheva, L., 2004. Combining Pattern Classifiers: Methods and Algorithms, Wiley-Interscience. http://dx.doi.org/10.1002/0471660264
Laerty, J., McCallum, A., and Pereira, F., 2001. Conditional Random Fields: Probabilis-tic Models for Segmenting and Labeling Sequence Data, in: Proceedings of the Eighteenth International Conference on Machine Learning, ICML '01, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 282--289.

Li, Z., Zhou, D., Juan, Y.-F., and Han, J., 2010. Keyword Extraction for Social Snip-pets, in: Proceedings of the 19th International Conference onWorld Wide Web, WWW '10, ACM, New York, NY, USA, 1143--1144.

Luhn, H., 1958. The automatic creation of literature abstracts, IBM J. Res. Dev. 2, 159--165. http://dx.doi.org/10.1147/rd.22.0159

Matsuo, Y., and Ishizuka, M., 2004. Keyword Extraction from a Single Document using Word Co-Occurrence Statistical Information, International Journal on Articial In-telligence Tools 13 (1) 157--169. http://dx.doi.org/10.1142/S0218213004001466

Medelyan, O., Frank, E., and Witten, I., 2009. Human-competitive Tagging Using Au-tomatic Keyphrase Extraction, in: Proceedings of the 2009 Conference on Empiri-cal Methods in Natural Language Processing: Volume 3 - Volume 3, EMNLP '09, Association for Computational Linguistics, Stroudsburg, PA, USA, 1318--1327.

Medelyan, O., and Witten, I., 2006. Thesaurus Based Automatic Keyphrase Indexing, in: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL '06, ACM, New York, NY, USA, 296--297. http://dx.doi.org/10.1145/1141753.1141819

Mihalcea, R., and Tarau, P., 2004. Textrank: Bringing order into textl, in: Conference on Empirical Methods in Natural Language Processing (EMNLP), 404--411.

Plas, L., Pallotta, V., Rajman, M., Ghorbel, H, 2004. Automatic Keyword Extraction from Spoken Text. A Comparison of two Lexical Resources: the EDR and Word-Net, CoRR cs.CL/0410062.

Peng, F., and McCallum, A., 2006. Information Extraction from Research Papers Using Conditional Random Fields, Inf. Process. Manage. 42 (4) 963--979. http://dx.doi.org/10.1016/j.ipm.2005.09.002

Ramshaw, L., and Marcus, M., 1999. Text Chunking using Transformation-Based Learning, CoRR .

Robertson, S., 2004. Understanding inverse document frequency: On theoretical argu-ments for IDF, Journal of Documentation 60, 503--520. http://dx.doi.org/10.1108/00220410410560582

Rose, S., Engel, D., Cramer, N., Cowley, W., 2010. Automatic Keyword Extraction from Individual Documents, in: M. W. Berry, J. Kogan (Eds.), Text Mining. Ap-plications and Theory, John Wiley and Sons, Ltd, 1—20. http://dx.doi.org/10.1002/9780470689646.ch1

Sarkar, K., Nasipuri, M., and Ghose, S., 2010. A New Approach to Keyphrase Extrac-tion Using Neural Networks, CoRR abs/1004.3274.

Sutton, C., and McCallum, A., 2007. An Introduction to Conditional Random Fields for Relational Learning, in: L. Getoor, B. Taskar (Eds.), Introduction to Statistical Relational Learning, MIT Press.

Timonen, M., Toivanen, T., Teng, Y., Chen, C., and He, L., 2012. Informativeness-based Keyword Extraction from Short Documents, in: A. L. N.

Fred, J. Filipe, A. L. N. Fred, J. Filipe (Eds.), KDIR, SciTePress, 411--421.

Turney, P., 2000. Learning Algorithms for Keyphrase Extraction, Information Retrieval 2 (4) 303--336, ISSN 1386-4564.

Wan, X., and Xiao, J., 2008. Single Document Keyphrase Extraction Using Neighbor-hood Knowledge, in: AAAI, 855--860.

Wang, J., Peng, H., Hu, J.-S., and Zhang, J., 2006. Ensemble learning for keyphrases extraction from scientic document, in: J. Wang, Z. Yi, J. M.

Zurada, B.-L. Lu, H. Yin (Eds.), ISNN'06, Lecture Notes in Computer Science, Springer, Berlin, 1267--1272.

Witten, I., Paynter, G., Frank, E., Gutwin, C., and Nevill-Manning, C., 1999. KEA: Practical Automatic Keyphrase Extraction, in: Proceedings of the Fourth ACM Conference on Digital Libraries, DL '99, ACM, New York, NY, USA, 254--255. http://dx.doi.org/10.1145/313238.313437

Yu, F., Xuan, H., and Zheng, D., 2012. Key-Phrase Extraction Based on a Combination of CRF Model with Document Structure, in: 8th Int. Conference on Computation-al Intelligence and Security, IEEE, 406--410. http://dx.doi.org/10.1109/cis.2012.97

Zhang, C., 2009. Combining Statistical Machine Learning Models to Extract Keywords from Chinese Documents, in: R. Huang, Q. Yang, J. Pei, J.

Gama, X. Meng, X. Li (Eds.), Advanced Data Mining and Applications, vol. 5678 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, 745--754. http://dx.doi.org/10.1007/978-3-642-03348-3_79

Zhang, C., Wang, H., Liu, Y., Wu, D., Liao, Y., and Wang, B., 2008. Automatic key-word extraction from documents using conditional random fields, Journal of Computational Information Systems 4 (3) 1169—1180.
Alves, A. O., & Ribeiro, B. (2015). Consensus-based Approach for Keyword Extraction from Urban Events Collections. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 4(2), 41–60. https://doi.org/10.14201/ADCAIJ2015424160

Downloads

Download data is not yet available.

Author Biographies

Ana OLIVEIRA Alves

,
Centre of Informatics and Systems, University of Coimbra, Portugal & Polytechnic Institute of Coimbra, Portugal
Ana Alves is a full-member researcher at the Center for Informatics and Systems of the University of Coimbra in Portugal. She received a M.Sc. and a PhD degree in informatics Engineering both from the Informatics Engineering Department, University of Coimbra, Coimbra, Portugal. She is an Assistant Professor at Polytechnic Institute of Coimbra.Her main research interests and publications relate to Ambient Intellience, Information Extraction and Semantics and Natural Language Processing.

Bernardete Ribeiro

,
Department of Informatics Engineering, University of Coimbra, Portugal
 Bernardete Ribeiro is Professor at the Informatics Engineering Department, Faculty of Science and Technology, University of Coimbra in Portugal. She received a MSc degree in Computer Science and a PhD in Informatics Engineering both from the Informatics Engineering Department, University of Coimbra. Her main publications are in the areas of neural networks and their applications to engineering systems, computational intelligence and support vector machines. She is a member of ACM and IEEE.
+