Title: Latent Semantic Indexing: What can we learn from a close study of the data?
Speaker: April Kontostathis, Ursinus College
Date: Monday, October 8, 2007 12:00 - 1:30 pm
Location: DyDan Center, CoRE Bldg, Room 431, Rutgers University, Busch Campus, Piscataway, NJ
Abstract:
This session will focus on an analysis of the values used by Latent Sematic Indexing (LSI) for information retrieval. LSI is based on a linear algebraic technique, Singular Value Decomposition. We will show that a close analysis of the values in the SVD has led to one theoretical and two practical results. We first will discuss a theoretical framework that is based on term co-occurrence data. We show a strong correlation between second-order term co-occurrence and the values produced by the SVD algorithm. We will follow with a discussion of two modifications to LSI that have been shown to reduce resource requirements while maintaining or improving retrieval performance, as measured by precision and recall.
The speaker's doctoral dissertation, A Term Co-occurrence Based Framework for Understanding LSI: Theory and Practice, developed a theoretical model for understanding LSI that is largely based on the values produced by the SVD process.