A Systematic Analysis of Various Word Sense Disambiguation Approaches

Chandra Ganesh; Sanjay K. Dwivedi; Satya Bhushan Verma; Manish Dixit

doi:10.14201/adcaij.31602

A Systematic Analysis of Various Word Sense Disambiguation Approaches

Chandra Ganesh

Department of Computer Science & Engineering, Madhav Institute of Technology & Science (Deemed University), Gwalior, M.P, India
Sanjay K. Dwivedi

Department of Computer Science, Babasaheb Bhimrao Ambedkar (A Central) University, Lucknow, UP, India
Satya Bhushan Verma

Computer Science & Engineering, Shri Ramswaroop Memorial University Lucknow Deva Road, Barabanki, India, 225003
Manish Dixit

Department of Computer Science & Engineering, Madhav Institute of Technology & Science (Deemed University), Gwalior, M.P, India

Abstract

The process of finding the correct sense of a word in context is known as word sense disambiguation (WSD). In the field of natural language processing, WSD has become a growing research area. Over the decades, so many researchers have proposed the many approaches to WSD. A development of this field has created the significant impact on several Web-based applications such as information retrieval and information extraction. This paper contains the description of various approaches such as knowledge-based, supervised, unsupervised and semi-supervised. This paper also describes the various applications of WSD, such as information retrieval, machine translation, speech recognition, computational advertising, text processing, classification of documents and biometrics.

Referencias
Cómo citar
Del mismo autor
Métricas

Adala, A., Tabbane, N., & Tabbane, S. (2014, March). A novel semantic approach for Web service discovery using computational linguistics techniques. In Fourth International Conference on Communications and Networking, ComNet-2014 (pp. 1-6). IEEE. https://doi.org/10.1109/ComNet.2014.6840909

Bar-Hillel, Y. (1960). The Present Status of Automatic Translation of Languages. Advances in computers, 1, 91-163. https://doi.org/10.1016/s0065-2458(08)60607-5

Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E. (2009). The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 43(3), 209-226. https://doi.org/10.1007/s10579-009-9081-4

Bentivogli, L., Forner, P., & Pianta, E. (2004). Evaluating cross-language annotation transfer in the multisemcor corpus. In COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics (pp. 364-370). https://doi.org/10.3115/1220355.1220408

Berrar, D. (2019). Bayes’ theorem and naive Bayes classifier. Encyclopedia of Bioinformatics and Computational Biology, 1, 403-412. https://doi.org/10.1016/b978-0-12-809633-8.20473-1

Black, E. (1988). An experiment in computational discrimination of English word senses. IBM Journal of research and development, 32(2), 185-194. https://doi.org/10.1147/rd.322.0185

Bond, F., Baldwin, T., Fothergill, R., & Uchimoto, K. (2012, January). Japanese SemCor: A sense-tagged corpus of Japanese. In Proceedings of the 6th global WordNet conference (GWC 2012) (pp. 56-63).

Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992, July). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 144-152). https://doi.org/10.1145/130385.130401

Breen, J. W. (2003, July). Word usage examples in an electronic dictionary. In Papillon (Multi-lingual Dictionary) Project Workshop.

Brown, K. (2005). Encyclopedia of language and linguistics (Vol. 1). Elsevier.

Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., & Mercer, R. L. (1991, June). Word-sense disambiguation using statistical methods. In 29th Annual meeting of the Association for Computational Linguistics (pp. 264-270). https://doi.org/10.3115/981344.981378

Bunke, H., & Sanfeliu, A. (1990). Syntactic and structural pattern recognition: theory and applications (Vol. 7). In World Scientific eBooks. https://doi.org/10.1142/0580

Burnard, L. (2007). Reference Guide for the British National Corpus (XML Edition). Published for the British National Corpus Consortium by the Research Technologies Service at Oxford University Computing Services.

Chandra, G., & Dwivedi, S. K. (2014, December). A literature survey on various approaches of word sense disambiguation. In 2014 2nd International Symposium on Computational and Business Intelligence (pp. 106-109). IEEE. https://doi.org/10.1109/ISCBI.2014.30

Chandra, G., & Dwivedi, S. K. (2017). Assessing query translation quality using back translation in hindi-english CLIR. International Journal of Intelligent Systems and Applications, 9(3), 51-59. https://doi.org/10.5815/ijisa.2017.03.07

Chandra, G., & Dwivedi, S. K. (2019). Query expansion for effective retrieval results of Hindi–English Cross-Lingual IR. Applied Artificial Intelligence, 33(7), 567-593. https://doi.org/10.1080/08839514.2019.1577018

Chandra, G., & Dwivedi, S. K. (2020). Query expansion based on term selection for Hindi – English cross lingual IR. Journal of King Saud University - Computer and Information Sciences, 32(3), 310-319. https://doi.org/10.1016/j.jksuci.2017.09.002

Chandra, G., & Dwivedi, S. K. (2020a). Term Ordering-Based Query Expansion Technique for Hindi-English CLIR System. In Advances in data mining and database management book series (pp. 283-302). https://doi.org/10.4018/978-1-7998-2491-6.ch016

Chandrika, C. P., & Kallimani, J. S. (2022, May). Word Sense Disambiguation for Indian Regional Language Using BERT Model. In Smart Intelligent Computing and Applications, Volume 2: Proceedings of Fifth International Conference on Smart Computing and Informatics (SCI 2021) (pp. 127-137). Singapore: Springer Nature Singapore.

Charniak, E., Blaheta, D., Ge, N., Hall, K., Hale, J., & Johnson, M. (2000). Bllip 1987-89 wsj corpus release 1. Linguistic Data Consortium, Philadelphia, 36.

Chen, P., Bowes, C., Ding, W., & Choly, M. (2012). Word Sense Disambiguation with Automatically Acquired Knowledge. IEEE Intelligent Systems, 27(4), 46-55. https://doi.org/10.1109/mis.2010.134

Chklovski, T., & Mihalcea, R. (2002, July). Building a sense tagged corpus with open mind word expert. WSD ’02: Proceedings of the ACL-02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, 8, 116-122. https://doi.org/10.3115/1118675.1118692

Correa Jr, E. A., Lopes, A. A., & Amancio, D. R. (2018). Word sense disambiguation: A complex network approach. Information Sciences, 442, 103-113. https://doi.org/10.1016/j.ins.2018.02.047

Dagan, I., & Itai, A. (1994). Word sense disambiguation using a second language monolingual corpus. Computational linguistics, 20(4), 563-596.

Duque, A., Martinez-Romo, J., & Araujo, L. (2016). Can multilinguality improve Biomedical Word Sense Disambiguation? Journal of Biomedical Informatics, 64, 320-332. https://doi.org/10.1016/j.jbi.2016.10.020

Dwivedi, S. K., & Chandra, G. (2016). A Survey on Cross Language Information Retrieval. International Journal On Cybernetics & Informatics, 5(1), 127-142. https://doi.org/10.5121/ijci.2016.5113

Edmonds, P., & Agirre, E. (2007). Word Sense Disambiguation: Algorithms and Applications. Springer Dordrecht. https://doi.org/10.1007/978-1-4020-4809-8

Elayeb, B. (2018). Arabic word sense disambiguation: a review. Artificial Intelligence Review, 52(4), 2475-2532. https://doi.org/10.1007/s10462-018-9622-6

Francis, W. N., & Kucera, H. (1979). Brown Corpus Manual, Department of Linguistics, Brown University. Brown corpus manual. Department of Liguistics Brown University.

Grefenstette, G. (1993). Evaluation techniques for automatic semantic extraction: comparing syntactic and window-based approaches. In Acquisition of Lexical Knowledge from Text.

Grozea, C. (2004, July). Finding optimal parameter settings for high performance word sense disambiguation. In Proceedings of SENSEVAL-3, the third international workshop on the evaluation of systems for the semantic analysis of text (pp. 125-128).

Guthriee, J. A., Guthrie, L., Aidinejad, H., & Wilks, Y. (1991, June). Subject-dependent co-occurrence and word sense disambiguation. In 29th Annual meeting of the Association for Computational Linguistics (pp. 146-152).

Jain, R., & Sulochana Nathawat, D. G. (2012). Sense Disambiguation Techniques: A Survey. International Journal, 1(1).

Jin, P., & Chen, X. (2013, December). A Word Sense Probabilistic Topic Model. In 2013 Ninth International Conference on Computational Intelligence and Security (pp. 401-404). IEEE. https://doi.org/10.1109/CIS.2013.91

Jin, P., Sui, R., & Zhang, Y. (2010, December). A Knowledge based Method for Chinese Word Sense Induction. In 2010 Fourth International Conference on Genetic and Evolutionary Computing (pp. 248-251). IEEE.

Kaplan, A. (1955). An experiment study of ambiguity and context. Mechanical Translation, 2, 39-46.

Kelly, E. & Stone, P. (1975). Computer Recognition of English Word Senses, Amsterdam: North-Holland.

Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In Proceedings of machine translation summit x: papers (pp. 79-86).

Kohli, H. (2021, March). Transfer learning and augmentation for word sense disambiguation. In European Conference on Information Retrieval (pp. 303-311). Cham: Springer International Publishing.

Krovetz, R., & Croft, W. B. (1992). Lexical ambiguity and information retrieval. ACM Transactions on Office Information Systems, 10(2), 115-141. https://doi.org/10.1145/146802.146810

Kwong, O. Y. (2012). New perspectives on computational and cognitive strategies for word sense disambiguation. Springer Science & Business Media.

Leacock, C., Chodorow, M., & Miller, G. A. (1998). Using corpus statistics and WordNet relations for sense identification. Computational Linguistics, 24(1), 147-165.

Lesk, M. (1986, June). Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual international conference on Systems documentation (pp. 24-26). https://doi.org/10.1145/318723.318728

Li, C., Sun, A., & Datta, A. (2011). A generalized method for word sense disambiguation based on wikipedia. In Advances in Information Retrieval: 33rd European Conference on IR Research, ECIR 2011, Dublin, Ireland, April 18-21, 2011. Proceedings 33 (pp. 653-664). Springer Berlin Heidelberg.

Lin, D. (1998, August). Automatic retrieval and clustering of similar words. In 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2 (pp. 768-774). https://doi.org/10.3115/980691.980696

Madhu, S., & Lytle, D. W. (1965). A figure of merit technique for the resolution of non-grammatical ambiguity. Me Mechanical Translation and Computational Linguistics, 8(2), 9-13.

Masterman, M. (1961). Semantic message detection for machine translation, using an interlingua. In Proceedings of the International Conference on Machine Translation and Applied Language Analysis.

McCarthy, D. (2009). Word sense disambiguation: An overview. Language and Linguistics compass, 3(2), 537-558. https://doi.org/10.1111/j.1749-818x.2009.00131.x

McCarthy, D., & Carroll, J. (2003). Disambiguating nouns, verbs, and adjectives using automatically acquired selectional preferences. Computational Linguistics, 29(4), 639-654. https://doi.org/10.1162/089120103322753365

McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115-133. https://doi.org/10.1007/bf02478259

McInnes, B. T. (2009). Supervised and knowledge-based methods for disambiguating terms in biomedical text using the umls and metamap. University of Minnesota.

Miller, G. A. (1995). WordNet. Communications of the ACM, 38(11), 39-41. https://doi.org/10.1145/219717.219748

Miller, G. A., Chodorow, M., Landes, S., Leacock, C., & Thomas, R. G. (1994). Using a semantic concordance for sense identification. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994. https://doi.org/10.3115/1075812.1075866

Navigli, R. (2009). Word sense disambiguation. ACM Computing Surveys, 41(2), 1-69. https://doi.org/10.1145/1459352.1459355

Navigli, R. (2006). Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance. Proc. of the 44th Annual Meeting of the Association for Computational Linguistics joint with the 21st International Conference on Computational Linguistics (COLING-ACL 2006), Sydney, Australia. https://doi.org/10.3115/1220175.1220189

Navigli, R., & Lapata, M. (2010). An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(4), 678-692. https://doi.org/10.1109/tpami.2009.36

Navigli, R., & Ponzetto, S. P. (2012, July). Multilingual WSD with just a few lines of code: the BabelNet API. In Proceedings of the ACL 2012 System Demonstrations (pp. 67-72).

Ng, H. T. (1997). Getting serious about word sense disambiguation. In Tagging Text with Lexical Semantics: Why, What, and How?

Ng, H. T., & Lee, H. B. (1996). Integrating multiple knowledge sources to disambiguate word sense: An exemplar-based approach. arXiv preprint cmp-lg/9606032. https://doi.org/10.3115/981863.981869

Niu, Z. Y., Ji, D., & Tan, C. L. (2005, June). Word sense disambiguation using label propagation based semi-supervised learning. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05) (pp. 395-402). https://doi.org/10.3115/1219840.1219889

Pedersen, T. (2007). Unsupervised Corpus-Based Methods for WSD. In Text, speech and language technology (pp. 133-166). https://doi.org/10.1007/978-1-4020-4809-8_6

Pradhan, S., Loper, E., Dligach, D., & Palmer, M. (2007, June). Semeval-2007 task-17: English lexical sample, srl and all words. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007) (pp. 87-92). https://doi.org/10.3115/1621474.1621490

Quinlan, J. R. (1987, January). >Decision trees as probabilistic classifiers. In Proceedings of the Fourth International Workshop on Machine Learning (pp. 31-37). Morgan Kaufmann. https://doi.org/10.1016/B978-0-934613-41-5.50007-6

Raganato, A., Camacho-Collados, J., & Navigli, R. (2017). Word sense disambiguation: a uinified evaluation framework and empirical comparison. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers (Vol. 1, pp. 99-110). https://doi.org/10.18653/v1/E17-1010

Resnik, P., & Yarowsky, D. (1999). Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation. Natural Language Engineering, 5(2), 113-133. https://doi.org/10.1017/s1351324999002211

Rivest, R. L. (1987). Learning decision lists. Machine Learning, 2(3), 229-246. https://doi.org/10.1007/bf00058680

Sammut, C., & Webb, G. I. (2010). Encyclopedia of Machine Learning. Springer Science & Business Media. https://doi.org/10.1007/978-0-387-30164-8

Schutze, H. (1998). Automatic word sense discrimination. Computational linguistics, 24(1), 97-123.

Singh, S., & Siddiqui, T. J. (2015). Role of semantic relations in Hindi word sense disambiguation. Procedia Computer Science, 46, 240-248. https://doi.org/10.1016/j.procs.2015.02.017

Singh, V. P., & Kumar, P. (2019). Word sense disambiguation for Punjabi language using deep learning techniques. Neural Computing and Applications, 32(8), 2963-2973. https://doi.org/10.1007/s00521-019-04581-3

Tan, L., & Bond, F. (2011, December). Building and annotating the linguistically diverse NTU-MC (NTU-multilingual corpus). In Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (pp. 362-371).

Verma, S. B., Pandey, B., & Kumar Gupta, B. (2023). Containerization and its Architectures: A Study. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 11(4), 395-409. https://doi.org/10.14201/adcaij.28351

Verma, S. B., & Saravanan, C. (2018, September). Performance analysis of various fusion methods in multimodal biometric. In 2018 International Conference on Computational and Characterization Techniques in Engineering & Sciences (pp. 5-8). IEEE. https://doi.org/10.1109/CCTES.2018.8674156

Verma, S. B., Yadav, A. K. (2021). Hard Exudates Detection: A Review., Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 1286. Springer, Singapore. https://doi.org/10.1007/978-981-15-9927-9_12

Vickrey, D., Biewald, L., Teyssier, M., & Koller, D. (2005, October). Word-sense disambiguation for machine translation. In Proceedings of human language technology conference and conference on empirical methods in natural language processing (pp. 771-778). https://doi.org/10.3115/1220575.1220672

Walker, D., & Amsler, R. (1986). The use of machine-readable dictionaries in sublanguage analysis. Analyzing Language in Restricted Domains, 69-83.

Wang, Y. F., Zhang, Y. J., Xu, Z. T., & Zhang, T. (2006, August). Research on dual pattern of unsupervised and supervised word sense disambiguation. In 2006 International Conference on Machine Learning and Cybernetics (pp. 2665-2669). IEEE. https://doi.org/10.1109/ICMLC.2006.258922

Weaver, W. (1949). Translation. Mimeographed, 12pp. July15, 1949. Reprinted in Locke, W. N. & Booth, A.D. (1955), Machine Translation of Languages (pp. 15-23). New York: John Wiley & Sons.

Weiss, S. F. (1973). Learning to disambiguate. Information Storage and Retrieval, 9(1), 33-41. https://doi.org/10.1016/0020-0271(73)90005-3

Widdows, D., & Dorow, B. (2002). A graph model for unsupervised lexical acquisition. In COLING 2002: The 19th International Conference on Computational Linguistics. https://doi.org/10.3115/1072228.1072342

Yarowsky, D. (1992). Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. In COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics. https://doi.org/10.3115/992133.992140

Yarowsky, D. (1994). Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. arXiv preprint cmp-lg/9406034. https://doi.org/10.3115/981732.981745

Zipf, G. K. (2013). 3. Relative Frequency and Dynamic Equilibrium in Phonology and Morphology. In Eight Decades of General Linguistics (pp. 57-75). Brill. https://doi.org/10.1163/9789004242050_005

Ganesh, C., Dwivedi, S. K., Verma, S. B., & Dixit, M. (2024). A Systematic Analysis of Various Word Sense Disambiguation Approaches. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 13(1), e31602. https://doi.org/10.14201/adcaij.31602

Download Citation

Most read articles by the same author(s)

Satya Bhushan Verma, Brijesh Pandey, Bineet Kumar Gupta, Containerization and its Architectures: A Study , ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal: Vol. 11 No. 4 (2022)
Anamika Agarwal, Satya Bhushan Verma, Bineet Kumar Gupta, A Review of Cloud Security Issues and Challenges , ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal: Vol. 12 (2023)
Satya Bhushan Verma, Abhay Kumar Yadav, Detection of Hard Exudates in Retinopathy Images , ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal: Vol. 8 No. 4 (2019)
Suyogita Singh, Satya Bhushan Verma, Resolving Covid-19 with Blockchain and AI , ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal: Vol. 13 (2024)
Satya Bhushan Verma, Shashi Bhushan Verma, Secure Data Transmission in BPEL (Business Process Execution Language) , ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal: Vol. 9 No. 3 (2020)

Downloads

Download data is not yet available.

+ −

Editorial dates

Submitted:

15-08-2023

Acceptance:

27-12-2023

Published:

02-12-2024

Issue

Vol. 13 (2024)

Section

Articles

Keywords

word sense disambiguation
knowledge based approach
supervised approach
unsupervised approach

Supporting agencies

This research didn't have any funding

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.