
Our work in this area over the last few years has three main directions. First, we studied knowledge representations and relational learning over structured domains [(Khardon et. al., 1999), (Cumby and Roth, 2000), (Roth and Yi, 2001), (Cumby and Roth, 2002), (Cumby and Roth, 2003), (Cumby and Roth, 2003a), (Roth and Yi, 2005)]. Progress in this direction has led to the development of a description-logic inspired representation that has been the key representation we have been using for the development of the second direction -- our work on an inference models for textual entailment [(Braz et. al., 2005), (Braz et. al., 2005a)]. Finally, we have made a significant progress on the problem of inference with probabilistic first-order representations; we developed the first lifted probabilistic inference algorithm [(Braz et. al., 2005)] -- one that does not need to resort to propositionalization.
Learning becomes easy once the correct input representation has been chosen, for example, one that produces linearly separable point sets. Our work here focuses on automatically generating intermediate representations to aid supervised learning algorithms. Most of the machine learning work in natural language (as in other domains like machine vision) is done over structured input, where attributes, as well as relations among them, need to be encoded as input to the learning algorithm. Most of the computation in machine learning is thus spent at the stage of feature extraction - both off line, in order to determine what types of features are important, and on-line, as the feature extraction process is often more time consuming than training and evaluation.
We developed a representation language for learning over structured domains within a propositionalization framework that decouples feature construction and model construction. We formalized this in two different and complementary ways, that address several aspects of the problem. First, we developed and studied a flexible knowledge representation for structured data, with an associated language that provides the syntax and a well defined equivalent semantics for expressing complex structured data succinctly. Second, we used this language to automate the process of feature construction by expressing types of objects which are instantiated in the ground data, allowing us to determine the level at which learning is done. Finally, this process of re-representation of domain elements allows general purpose learning schemes, such as feature efficient linear algorithms and probabilistic representations, to be defined over the resulting space, yielding efficient and expressive learning of relational functions over a structured domain using efficient propositional means.
The Feature EXtraction language that we designed, FEX, has been made available to the research community, and has been downloaded by a large number of researchers.
Our formalism uses description logics and concept graphs in the service of learning relational models using efficient propositional learning algorithms. We introduced Feature Description Logic (FDL) - a relational (frame based) language that supports efficient inference, along with a generation function that uses inference with descriptions in the FDL to produce features suitable for use by learning algorithms. These are used within a learning framework that is shown to efficiently and accurately learn relational representations in terms of the FDL descriptions. The paradigm was designed and shown effective in support of learning in domains that are relational but where the amount of data and size of representation learned are very large. This paradigm provides a natural solution to the problem of learning and representing relational data; it extends and unifies several lines of works in KRR and Machine Learning in ways that provide hope for a coherent usage of learning and reasoning methods in large scale intelligent inference.
Complementing our work on relational intermediate representations described above is the work on kernels for structured data [(Cumby and Roth, 2003)]. This work addresses also the tradeoff between the computational efficiency with which these kernels can be computed and the generalization ability of the resulting classifier. This study complements our theoretical study of these issues, in the context of studying structured kernels for on-line learning algorithms [(Khardon et. al., 2002), (Khardon et. al., 2005)].
Indicative of the usefulness of our approach -- originally developed in the context of natural language problems -- is its success in facilitating learning in visual recognition problems [(Roth et. al., 2002), (Agarwal and Roth, 2002), (Agarwal et. al., 2004)], done as part of our attempt to study learning in multi-modal environments. The knowledge representation and learning approach are identical in the natural language and visual recognition domains, with the only difference being the sensory level (sensors, formally defined objects within the language as a mapping from observations to "formulas" [(Roth and Yi, 2001)]). Our visual recognition work on relational representations within a discriminaroty learning framework [(Agarwal and Roth, 2002), (Agarwal et. al., 2004)] (also known as a constellation model), has already had a significant impact on this research area.
Semantic entailment is the problem of determining if the meaning of a given
sentence entails that of another. For example, determine if the sentence:
WalMart defended itself in court today against claims that its female
employees were kept out of jobs in management because they are women.
entails the sentence: WalMart was sued for sexual
discrimination.. This is a fundamental problem in natural language
understanding that provides a broad framework for studying language
variability and has a large number of applications. Over the last year we have
started to develop a principled approach to this problem that builds on
inducing representations of text snippets into a hierarchical knowledge
representation along with a sound optimization-based inferential mechanism
that makes use of it to decide semantic entailment.
[(Braz et. al., 2005), (Braz et. al., 2005a)] present a principled computational approach to semantic entailment in natural language, addressing some of the key problems encountered in traditional approaches, such as knowledge acquisition and brittleness. The solution uses a hierarchical knowledge representation language into which we induce appropriate representations of the given text and required background knowledge. The other main element is a sound inferential mechanism that makes use of the induced representation to determine an extended notion of subsumption, using an optimization approach that supports abstracting over language variability and representation inaccuracies.
The knowledge representation we study in the context of this problems is an Extended Feature Description Logic, an extension of the description logic based representation we alluded to earlier as the knowledge representation we developed for structured domains. Our view of inference as an optimization procedure builds on the constrained optimization formalisms described in Sec. [Learning and Inference].
We study inference algorithms and representations for expressive, first order, probabilistic reasoning. This work continues our early, high impact work on the complexity of probabilistic inference [(Roth, 1993), (Roth, 1996)] and attempts to provide solutions to some of the key barriers to the use of expressive probabilistic representations in large scale problems.
Most probabilistic inference algorithms are specified and processed on a propositional level. In the last decade, many proposals for more expressive, first order probabilistic representations were made, but none came with an algorithmic solution to the inference problem. In the inference stage they all still operate on a propositional representation level, resulting in a huge blow-up in the representation size in a way that eliminates any possible gain from the compact first order representation. This makes inference impossible for realistic size problems. Our work is the first exact inference algorithm (following ideas from [(Poole, 2003)] that applied for a special case) that operates directly on a first-order level, and that can be applied to any first-order model (specified in a language that generalizes undirected graphical models). This recent work [(Braz et. al., 2005)] promises to revolutionize inference with large relational probabilistic representations.