Many security-related questions involve drawing inferences using dynamic data from a nonstationary environment. For instance: has the rate of messages from a given individual changed; has the flow of cash into or out of an organization changed; has there been an unexpected change in travel patterns? Our project addresses the problem of making decisions in the presence of random processes that change over time as a learning problem. Real-time adaptive learning requires estimating parameters as data arrives, often requiring refit of regression models in the presence of massive, dynamic datasets. We will leverage recent research on estimation algorithms for large datasets to develop new algorithms for solving these estimation problems dynamically. We will design information collection strategies in a dynamic, nonstationary environment, cutting across the experimental design literature (which focuses on minimization of statistical error) and bandit processes (which consider the cost of making wrong decisions). In a massive dataset, the problem will be one of optimal sampling. In other settings, we may be able to directly control people and equipment (e.g., through financial information or airline ticket monitoring) to collect information. In both examples, we assume there is either a cost for collecting information, or a constraint on how much we can analyze, and so we seek to optimize data collection in resource-constrained contexts.
Document last modified on August 17, 2007.