Title: UMLS: The graph behind the forest
Many biomedical terminologies are organized as trees (e.g., International Classification of Diseases) or polyhierarchical structures (e.g., Medical Subject Headings, Gene Ontology). Browsers developed for these terminologies generally provide tree-like representations. In contrast, terminology integration systems identify concepts shared by several terminologies and create a large graph our of the originally disjoint trees.
We present the Unified Medical Language System (UMLS), which integrates over one million biomedical concepts from more than one hundred terminologies. We show that the resulting graph is broader, deeper and more richly connected than any of its constituent trees. We discuss some of the limitations of the UMLS graph, including presence of cycles, underspecification of the semantics of some links, semantic inconsistency and missing links. Finally, we show how biomedical graphs can be exploited for knowledge discovery.
Title: Representing, Reasoning with and Querying Semantic Graphs in PowerLoom
We have developed a suite of knowledge and link discovery tools called KOJAK that perform tasks such as pattern matching, group detection, anomaly detection and graph and relationship simplification on large and complex semantic graphs. KOJAK is a hybrid system combining symbolic knowledge representation and reasoning (KR&R) technology with statistical graph algorithms from the area of data mining. We describe how PowerLoom, a general-purpose, logic-based KR&R system is used by KOJAK to represent semantic graphs, link them to an ontology and provide reasoning services such as graph normalization, group extension and general graph querying and matching.
Title: Finding patterns in large, real networks
How does the internet look like? Which sub-graphs of such networks deviate from the 'normal' (and are thus suspicious groups of people/nodes/routers)? How do viruses/rumors propagate in a network? How do we scale up all these algorithms, to handle graphs of millions and billions of nodes and edges?
These are the questions we address. Earlier work has shown that the topology of real networks presents some surprising patterns: the Internet topology obeys power-laws, and so does the web topology. Both, along with social networks, they all exhibit 'small world' phenomena. We present a list of such patterns that real graphs follow, as well as fast, scalable algorithms to efficiently process huge datasets, in search of patterns, anomalies and outliers.
Title: Ontologies for Graph Matching: Practice and Potential
In practice, many implementations of graph matching make use of ontologies only in a cursory manner. This talk seeks to:
Title: Information Extraction: Building Graphs from Natural Language Text
The talk will provide an introduction to information extraction and briefly discuss the ACE hierarchy (entities, relations, events) and the problems of training new IE systems.
Title: Some Thoughts on Associating..... Something Really Useful with Graphs
Title: Semantic Hierarchy Analysis Using DAGs and Posets
Order theory is the mathematical theory of hierarchical structures, specifically lattices, partially ordered sets (posets), and other Directed Acyclic Graph (DAG) structures. While hierarchical systems are ubiquitous both in nature and in our computer systems, they play a special role in representing semantic information, specifically in ontological stores of lexical information, relational knowledge representations, non-traditional uncertainty quantifications, and multidimensional databases. In this talk we will survey some issues in our analytical perspective on semantic hierarchies, including the grounding of hierarchies in digraphs and DAGs, the use of poset metrics in ontology analysis, and some speculative thoughts about the use of order theory for OLAP and other multidimensional databases.
Title: Representing Patterns in PHERL
Many Link Analysis (LA) tools have been developed; each has particular strengths in helping analysts acquire patterns or use patterns to detect evidence in data. In order to provide analysts with the benefits from a range of tools, workbench systems are being created that include many pattern-based LA tools. In such systems, different tools must be able to share patterns. However, different tools represent patterns in different ways. PHERL is an interlingua that supports pattern-sharing across a community of pattern-based LA tools. It exploits a graph-based representation for patterns; the semantics of these graphs are defined by equivalent FOL rules. Experimental translation programs have translated example patterns between PHERL and the native representations of several link analysis tools.
Title: Ontology-Assisted Query of Graph Databases
We present our approach to exploiting the semantics of ontologies to assist in querying data graphs. The approach is based on current efforts that define a declarative graph query language that unifies common approaches to querying data graphs. A separate ontology that represents the semantics of the problem domain can augment a graph database, enabling graph queries to be formulated using terms of the ontology as well as terms of the graph database. Preserving the separation between ontology and graph databases provides users with some of the semantic power of ontologies while maintaining the performance advantage and the query paradigm of graph database systems. The presentation will focus on the paradigm difference among ontology query, ontology-assisted query and graph query as well as our reasons to maintain the differences. Finally, the presentation will describe the challenges and future directions of our efforts to create a comprehensive declarative query language for analysis of graph data.
Title: Adding Strength to Links
Most large scale data graphs have relatively small numbers of link types. These restricted relational vocabularies make it easier to form graphs, since the relationships tend to be broad and therefore hold between many things, but they also tend to be relatively unproductive, inferentially, as a result. Cyc has a very large relational vocabulary, with many thousands of relationships, some of them very specific. Many of these relationships are connected to background knowledge that makes them inferentially productive, so it is useful if the more specific relationships can be identified, converting part of the graph into a grounded knowledge base. In this talk, I will discuss a preliminary results from a number of efforts at Cycorp to identify specific relationships both in graphs directly, and in the text from which graphs are often derived.
Title: GEM: A Graph-Based Hierarchical Pattern Language for Imprecise Information Needs
Many domains require data analysis techniques that provide inexact views into the data. Noisy and incomplete data sets, and analysts' imprecise or evolving understanding of their information needs are some of the problem characteristics that require a pattern language that can access data more flexibly than SQL and similar query languages. This talk will discuss GEM (for Graph Edit Model), a language with the requisite flexibility for many such problems. GEM patterns include a semantic graph that represents a prototype of the situation of interest, and a data comparison metric based on graph edit distance to represent allowable deviations from that prototype. We give a description of the constructs of the language -- including higher-order constructs such as hierarchical graphs, disjunction, and cardinality -- and we demonstrate its use through examples taken from the Link Analysis Workbench (LAW) system.