DyDAn Homeland Security Seminar Series


Title: The Expected Determinant of the Random Gram Matrix and its Application to Information Retrieval Systems.

Speaker: Jacob Martin, Trinity College Dublin

Date: Monday, October 22, 2007 12:00 - 1:30 pm

Location: DyDan Center, CoRE Bldg, Room 431, Rutgers University, Busch Campus, Piscataway, NJ


Abstract:

Reduced rank matrix approximations obtained from singular value decompositions are routinely used to improve information retrieval applications (e.g. search engines). Although computing singular values can become intractable for very large matrices, probabilistic techniques can give us a shortcut. Furthermore, probability models can help explain the observed improvement in information retrieval performance when reducing the rank of a matrix.

Let w be a t-tall vector whose components are random variables (not necessarily independent). Next sample n independent vectors, creating a t by n matrix A. Then the random Gram matrix of size n is G(n) = A^(T)A. This talk partially concerns the derivation of an exponential generating function for the expected value of the determinant of G(n), E(det(G(n)). A recurrence derived from the generating function allows the computation of E(det(G(n+1))) from previously computed values. These methods can be used to calculate the expected coefficients of a polynomial whose roots exponentially approach the actual singular values of A as n approaches infinity.

The expected singular values can provide a starting point for computing singular value decompositions. Moreover, the association of particular expected singular values with certain probability models leads to a method for constructing special reduced rank approximations that exhibit better performance in comparison with traditional reduced rank approximations (e.g. Latent Semantic Indexing).