Title: Information Extraction and Annotation as a Methodology for Complex Domain Modeling
In the past, Information Extraction (IE) systems focused on certain well-defined classes of entities (people, organizations, locations, etc.) and on certain easily specified kinds of events. But recently, IE seems to have grown increasingly in demand, and the kinds of items desired by domain experts seem to be increasingly complex and hard to define. In our experiments in several domains (biomedicine, government intelligence, and eGovernment), we have found that what the domain expert needs from an IE engine is usually neither easy to extract nor even easy to define. In fact, it seems rather common that experts, in trying to specify what exactly they want, go through a process of discovery that frequently leaves them surprised at how little they understood some of the details of their own domain. Our solution has been to engage experts in a process of annotation, by which they highlight in text examples of what they need, followed by joint analysis and decomposition of the complex concepts present. Following this, we train IE engines to extract the desired information, and we use low performance as an indicator of potential problems in the definition of the target concepts. Using this methodology, we have learned to factor into extractable types some surprisingly sophisticated notions in psychology and to identify argumentation structure in emails.
Title: Sequential Decision Making Algorithms for Port of Entry Inspection
The problem of inspecting containers at ports of entry for weapons of mass destruction can be formulated as a problem in sequential decision making, where we decide which test to perform on a container based on outcomes of previous tests. Following work of Stroud and Saeger at Los Alamos, we investigate the formulation of the port of entry inspection problem as a problem of finding an optimal binary decision tree for an appropriate Boolean decision function. We report on an experimental analysis of the robustness of the conclusions of the Stroud-Saeger analysis and show that the optimal inspection strategy is remarkably insensitive to variations in the parameters needed to apply the Stroud-Saeger method. We report on new algorithms for finding optimal binary decision trees that are more efficient computationally than those presented by Stroud and Saeger. We achieve these efficiencies through a combination of specific numerical methods for finding optimal thresholds for sensor functions and a binary decision tree search algorithm that operates on a space of potentially acceptable binary decision trees.
Title: Global Learning and Inference with Constraints
The maturity of machine learning techniques allows us today to learn many low-level predicates and generate an appropriate vocabulary over which reasoning methods can be used to make significant progress in higher level domain decisions.
I will describe research on a framework that combines learning and inference and exhibit its use in the natural language processing domain. Key in this framework is the ability to incorporate declarative and expressive global information into the learning and decision stage. I will discuss the use of this framework as (1) a way to allow the output of local classifiers for different problem components to be assembled into a whole that reflects global preferences and constraints; (2) a way to improve probabilistic models by enforcing additional expressive constraints and (3) a way to significantly improve semi-supervised training of structured models.
Examples will be drawn from 'wh' attribution in natural language processing (determining who did what to whom when and where) and from information extraction problems.
Title: Automatic Subjectivity Analysis
A growing area of research, "subjectivity analysis" is the computational study of affect, opinions, and attitudes expressed in text. Blogs, editorials, reviews (of products, movies, books, etc.), and even "objective" newspaper articles (which include many opinions) are just some of the genres for which accurate identification and interpretation of opinions is critical for full text understanding. Subjectivity analysis will support developing tools for information analysts in governmental, commercial, and political domains who want to automatically track attitudes and feelings in the news and on-line forums. How do people feel about the latest iPod? Is there a change in the support for the new Medicare bill? A system able to automatically identify and extract opinions from text would be an enormous help to someone sifting through the vast amounts of news and web data, trying to answer these kinds of questions. In this talk, I will describe experiments developing and evaluating automatic systems for subjectivity analysis. In particular, I will describe experiments in recognizing subjective expressions from unannotated data; recognizing the "contextual polarity" of expressions, i.e., whether a phrase is being used to express a positive or negative opinion, considering the context in which it appears; and exploring interactions between subjectivity and word sense, showing that subjectivity is a property that can be associated with word meanings and that subjectivity classification can be beneficial for word sense disambiguation.