Institute for Discrete Sciences Mini-Symposium

September 28, 2007
The DyDAn Center at the Center for Discrete Mathematics and Theoretical Computer Science (DIMACS), Rutgers University

Organizers:
Tami Carpenter, DyDAn/DIMACS, tcar@dimacs.rutgers.edu
Fred Roberts, DyDAn/DIMACS, froberts@dimacs.rutgers.edu

Abstracts:

Eduard Hovy, Information Sciences Institute, University of Southern California

Title: Information Extraction and Annotation as a Methodology for Complex Domain Modeling

In the past, Information Extraction (IE) systems focused on certain well-defined classes of entities (people, organizations, locations, etc.) and on certain easily specified kinds of events. But recently, IE seems to have grown increasingly in demand, and the kinds of items desired by domain experts seem to be increasingly complex and hard to define. In our experiments in several domains (biomedicine, government intelligence, and eGovernment), we have found that what the domain expert needs from an IE engine is usually neither easy to extract nor even easy to define. In fact, it seems rather common that experts, in trying to specify what exactly they want, go through a process of discovery that frequently leaves them surprised at how little they understood some of the details of their own domain. Our solution has been to engage experts in a process of annotation, by which they highlight in text examples of what they need, followed by joint analysis and decomposition of the complex concepts present. Following this, we train IE engines to extract the desired information, and we use low performance as an indicator of potential problems in the definition of the target concepts. Using this methodology, we have learned to factor into extractable types some surprisingly sophisticated notions in psychology and to identify argumentation structure in emails.


Fred S. Roberts, DIMACS, Rutgers University

Title: Sequential Decision Making Algorithms for Port of Entry Inspection

The problem of inspecting containers at ports of entry for weapons of mass destruction can be formulated as a problem in sequential decision making, where we decide which test to perform on a container based on outcomes of previous tests. Following work of Stroud and Saeger at Los Alamos, we investigate the formulation of the port of entry inspection problem as a problem of finding an optimal binary decision tree for an appropriate Boolean decision function. We report on an experimental analysis of the robustness of the conclusions of the Stroud-Saeger analysis and show that the optimal inspection strategy is remarkably insensitive to variations in the parameters needed to apply the Stroud-Saeger method. We report on new algorithms for finding optimal binary decision trees that are more efficient computationally than those presented by Stroud and Saeger. We achieve these efficiencies through a combination of specific numerical methods for finding optimal thresholds for sensor functions and a binary decision tree search algorithm that operates on a space of potentially acceptable binary decision trees.


Dan Roth, University of Illinois at Urbana-Champaign

Title: Global Learning and Inference with Constraints

The maturity of machine learning techniques allows us today to learn many low-level predicates and generate an appropriate vocabulary over which reasoning methods can be used to make significant progress in higher level domain decisions.

I will describe research on a framework that combines learning and inference and exhibit its use in the natural language processing domain. Key in this framework is the ability to incorporate declarative and expressive global information into the learning and decision stage. I will discuss the use of this framework as (1) a way to allow the output of local classifiers for different problem components to be assembled into a whole that reflects global preferences and constraints; (2) a way to improve probabilistic models by enforcing additional expressive constraints and (3) a way to significantly improve semi-supervised training of structured models.

Examples will be drawn from 'wh' attribution in natural language processing (determining who did what to whom when and where) and from information extraction problems.


Jan Wiebe, University of Pittsburgh

Title: Automatic Subjectivity Analysis

A growing area of research, "subjectivity analysis" is the computational study of affect, opinions, and attitudes expressed in text. Blogs, editorials, reviews (of products, movies, books, etc.), and even "objective" newspaper articles (which include many opinions) are just some of the genres for which accurate identification and interpretation of opinions is critical for full text understanding. Subjectivity analysis will support developing tools for information analysts in governmental, commercial, and political domains who want to automatically track attitudes and feelings in the news and on-line forums. How do people feel about the latest iPod? Is there a change in the support for the new Medicare bill? A system able to automatically identify and extract opinions from text would be an enormous help to someone sifting through the vast amounts of news and web data, trying to answer these kinds of questions. In this talk, I will describe experiments developing and evaluating automatic systems for subjectivity analysis. In particular, I will describe experiments in recognizing subjective expressions from unannotated data; recognizing the "contextual polarity" of expressions, i.e., whether a phrase is being used to express a positive or negative opinion, considering the context in which it appears; and exploring interactions between subjectivity and word sense, showing that subjectivity is a property that can be associated with word meanings and that subjectivity classification can be beneficial for word sense disambiguation.


Previous: Program
Workshop Index
DyDAn Homepage
Contacting DyDAn
Document last modified on September 19, 2007.