Biomedical Literature Exploration through Latent Semantics
Abstract The fast increasing amount of articles published in the biomedical field is creating difficulties in the way this wealth of information can be efficiently exploited by researchers. As a way of overcoming these limitations and potentiating a more efficient use of the literature, we propose an approach for structuring the results of a literature search based on the latent semantic information extracted from a corpus. Moreover, we show how the results of the Latent Semantic Analysis method can be adapted so as to evidence differences between results of different searches. We also propose different visualization techniques that can be applied to explore these results. Used in combination, these techniques could empower users with tools for literature guided knowledge exploration and discovery.
- Referencias
- Cómo citar
- Del mismo autor
- Métricas
D.M. Blei, A.Y. Ng, M.I. Jordan, Latent Dirichlet Allocation, Journal of Machine Learning Research, 3 (2003) 993–1022.
M. Chagoyen, P. Carmona-Saez, H. Shatkay, J.M. Carazo, A. Pascual-Montano, Discovering semantic features in the literature: a foundation for building functional associations, BMC Bioinformatics 7 (2006) 41.
H. Chen, B.M. Sharp, Content-rich biological network constructed by mining PubMed abstracts, BMC Bioinformatics 5 (2004) 147.
D. Cheng, C. Knox, N. Young, P. Stothard, S. Damaraju, D.S. Wishart, PolySearch: a webbased text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites, Nucleic Acids Resesarch, 36 (suppl 2) (2008) W399–W405.
S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, R. Harshman, Indexing by Latent Semantic Analysis, Journal of the American Society for Information Science, 41 (1990) 391–407.
A. Doms, M. Schroeder, GoPubMed: exploring PubMed with the Gene Ontology, Nucleic Acids Research 33 (Web Server issue) (2005) W783–W786.
B. Done, P. Khatri, A. Done, S. Draghici, Predicting novel human Gene Ontology annotations using semantic analysis, IEEE/ACM Trans. On Computational Biology and Bioinformatics 7 (1) (2010) 91–99.
R. Homayouni, K. Heinrich, L. Wei, M.W. Berry, Gene clustering by latent semantic indexing of Medline abstracts, Bioinformatics 21 (1) (2005) 104–115.
Jahiruddin, M. Abulaish, L. Dey, A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora, Journal of Biomedical Informatics 43 (2010) 1020–-1035.
J.J. Kim, D. Rebholz-Schuhmann, Categorization of services for seeking information in biomedical literature: a typology for improvement of practice, Briefings in Bioinformatics 9 (6) (2008) 452–465.
T.K. Landauer, P.W. Foltz, D. Laham, An introduction to latent semantic analysis. Discourse Processes 25 (2-3) (1998) 259–284.
Z. Lu, PubMed and beyond: a survey of web tools for searching biomedical literature, Database 2011 (2011) baq036.
UMLS Metathesaurus Fact Sheet. Available at
A. Perez-Iratxeta, A.J. Pérez, P. Bork, M.A. Andrade, Update on XplorMed: a web server for exploring scientific literature, Nucleic Acids Research 31 (2003) 3866–3868.
D. Rebholz-Schuhmann, H. Kirsch, M. Arregui, S. Gaudan, M. Riethoven, P. Stoehr, EBIMed—text crunching to gather facts for proteins from Medline, Bioinformatics 23 (2) (2007) e237–e244.
R. Rehurek, P. Sojka, Software Framework for Topic Modelling with Large Corpora, in: Proceedings of the LREC 2010 workshop New Challenges for NLP Frameworks, Valetta, Malta, 2010, pp. 46–50.
H. Shatkay, Hairpins in bookstacks: information retrieval from biomedical text, Briefings in Bioinformatics 6 (3) (2005) 222–238.
Y. Tsuruoka, J. Tsujii, S. Ananiadou, FACTA: a text search engine for finding associated biomedical concepts, Bioinformatics 24 (21) (2008) 2559–2560.
K. Van Deun, W.J. Heiser, L. Delbeke, Multidimensional unfolding by nonmetric multi-dimensional scaling of Spearman distances in the extended permutation polytope, Multivariate Behavioral Research 42 (1) (2007) 103–132.
L. Xu, N. Furlotte, Y. Lin, K. Heinrich, M.W. Berry, E.O. George, R. Homayouni, Functional cohesion of gene sets determined by latent semantic indexing of Pubmed abstracts, PLOS One 6 (4) (2011) e18851.
H.-T. Zheng, C. Borchert, Y. Jiang, A knowledge-driven approach to biomedical document conceptualization, Artificial Intelligence in Medicine 49 (2010) 67–78.
M. Chagoyen, P. Carmona-Saez, H. Shatkay, J.M. Carazo, A. Pascual-Montano, Discovering semantic features in the literature: a foundation for building functional associations, BMC Bioinformatics 7 (2006) 41.
H. Chen, B.M. Sharp, Content-rich biological network constructed by mining PubMed abstracts, BMC Bioinformatics 5 (2004) 147.
D. Cheng, C. Knox, N. Young, P. Stothard, S. Damaraju, D.S. Wishart, PolySearch: a webbased text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites, Nucleic Acids Resesarch, 36 (suppl 2) (2008) W399–W405.
S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, R. Harshman, Indexing by Latent Semantic Analysis, Journal of the American Society for Information Science, 41 (1990) 391–407.
A. Doms, M. Schroeder, GoPubMed: exploring PubMed with the Gene Ontology, Nucleic Acids Research 33 (Web Server issue) (2005) W783–W786.
B. Done, P. Khatri, A. Done, S. Draghici, Predicting novel human Gene Ontology annotations using semantic analysis, IEEE/ACM Trans. On Computational Biology and Bioinformatics 7 (1) (2010) 91–99.
R. Homayouni, K. Heinrich, L. Wei, M.W. Berry, Gene clustering by latent semantic indexing of Medline abstracts, Bioinformatics 21 (1) (2005) 104–115.
Jahiruddin, M. Abulaish, L. Dey, A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora, Journal of Biomedical Informatics 43 (2010) 1020–-1035.
J.J. Kim, D. Rebholz-Schuhmann, Categorization of services for seeking information in biomedical literature: a typology for improvement of practice, Briefings in Bioinformatics 9 (6) (2008) 452–465.
T.K. Landauer, P.W. Foltz, D. Laham, An introduction to latent semantic analysis. Discourse Processes 25 (2-3) (1998) 259–284.
Z. Lu, PubMed and beyond: a survey of web tools for searching biomedical literature, Database 2011 (2011) baq036.
UMLS Metathesaurus Fact Sheet. Available at
A. Perez-Iratxeta, A.J. Pérez, P. Bork, M.A. Andrade, Update on XplorMed: a web server for exploring scientific literature, Nucleic Acids Research 31 (2003) 3866–3868.
D. Rebholz-Schuhmann, H. Kirsch, M. Arregui, S. Gaudan, M. Riethoven, P. Stoehr, EBIMed—text crunching to gather facts for proteins from Medline, Bioinformatics 23 (2) (2007) e237–e244.
R. Rehurek, P. Sojka, Software Framework for Topic Modelling with Large Corpora, in: Proceedings of the LREC 2010 workshop New Challenges for NLP Frameworks, Valetta, Malta, 2010, pp. 46–50.
H. Shatkay, Hairpins in bookstacks: information retrieval from biomedical text, Briefings in Bioinformatics 6 (3) (2005) 222–238.
Y. Tsuruoka, J. Tsujii, S. Ananiadou, FACTA: a text search engine for finding associated biomedical concepts, Bioinformatics 24 (21) (2008) 2559–2560.
K. Van Deun, W.J. Heiser, L. Delbeke, Multidimensional unfolding by nonmetric multi-dimensional scaling of Spearman distances in the extended permutation polytope, Multivariate Behavioral Research 42 (1) (2007) 103–132.
L. Xu, N. Furlotte, Y. Lin, K. Heinrich, M.W. Berry, E.O. George, R. Homayouni, Functional cohesion of gene sets determined by latent semantic indexing of Pubmed abstracts, PLOS One 6 (4) (2011) e18851.
H.-T. Zheng, C. Borchert, Y. Jiang, A knowledge-driven approach to biomedical document conceptualization, Artificial Intelligence in Medicine 49 (2010) 67–78.
Matos, S., Araújo, H., & Oliveira, J. L. (2013). Biomedical Literature Exploration through Latent Semantics. ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal, 2(2), 65–74.
Download data is not yet available.