Title: Beyond "Bag of Words": Towards a Framework for Conceptual Retrieval
Speaker: Jimmy Lin, University of Maryland
Date: Thrusday, February 22, 2007 3:30 pm
Location: SCILS Faculty Lounge (4 Huntington, off of College Ave.), Rutgers University
Abstract:
Although the field of information retrieval has made enormous progress in the last half century, virtually all systems are still built on the remarkably simple concept of "counting words", under strong assumptions of term independence. Although these methods have been empirically validated (e.g., in TREC evaluations), it is a simple fact that words alone cannot capture the semantic content of documents and information needs.
In this talk, I will discuss a framework for "conceptual retrieval" that articulates the types of knowledge that are important for information seeking. This general framework is instantiated in a clinical question answering system that operationalizes the principles of evidence-based medicine (EBM). Experiments show that an EBM-based scoring algorithm outperforms a state-of-the-art baseline that employs only term statistics. Ablation studies further yield a better understanding of the performance contributions of different components.
I will conclude by discussing how other domains can benefit from knowledge-based approaches and the general applicability of this proposed framework.
Biography
Jimmy Lin is an assistant professor in the College of Information Studies (CLIS) at the University of Maryland, and is also a member of the Computational Linguistics and Information Processing (CLIP) laboratory in UMD's Institute for Advanced Computer Studies (UMIACS). He graduated with a Ph.D. in computer science from MIT in 2004. Jimmy's research lies at the intersection between information retrieval, natural language processing, and information science. In addition, he has also worked on theoretical linguistics at the syntax-semantic interface.