Knowledge Representation for Natural Language

Overview:

Knowledge Representation in Natural Language Text Processing has two aspects: representation of analysis of text (for example, part of speech tags, named entity labels and boundaries, semantic frames), and representation of the meaning of text. Meaning Representations (typically, canonical logical forms) tend to be brittle; while they can perform well in highly constrained domains, they fail when used with open-domain natural language text due to scalability, ambiguity, and coverage problems. More robust syntactic-level analysis (such as syntactic or dependency parse structure), on the other hand, is not sufficient for complex NLP tasks such as Question Answering and Semantic Entailment.

MRCS (Modular Representation and Comparison Scheme), which is being developed in the context of our work on Semantic Entailment, is intended to accommodate both. MRCS has three main underlying intuitions:

  • It should be possible to leverage semantic analysis where available (and reliable), and back off to shallower representations where it is not. MRCS's data structures treat the two types of representation interchangeably.
  • As researchers identify and find workable solutions to low-level NLP problems, a complex NLP system should grow to accommodate them. MRCS addresses this requirement via the concept of specialized Annotators that can work independently to augment an existing representation of the text of interest.
  • A significant amount of world knowledge can be encapsulated in localized resources that compare spans of text, whose decisions may be reconciled at the global level using machine learning and inference techniques. MRCS uses the concept of Comparators to address this requirement; Comparators are specialized decision components that are invoked by a global inference mechanism over pairs of text spans, and return a similarity score.

Relevant Publications: