Context Sensitive Paraphrasing

Overview:

Lexical paraphrasing (replacing one word with another) is an inherently context sensitive problem because a word's meaning depends on context. We have developed a global classifier that takes a verb v and its context (sentence that v appears in), along with a candidate verb u, and determines whether u can replace v in the given sentence while maintaining the original meaning. Training data for the classifier is generated without supervision, so we are able to create classifiers for specialized domains, either with large amounts of newswire text (first link) or biomedical abstracts (GENIA, second link).

Details:

This project attempts to answer the question, when can one word replace another in a given context. This question naturally arises as a sub problem for various NLP tasks such as query expansion or entailment. These are tasks where we cannot solely rely on a fixed set of synonyms because this list might not be appropriate for all meanings of the target word and in the case of entailment the synonyms might not be general enough to capture implied meaning in the given context.

Standard paraphrasing solutions to this problem involve generating lists of rules that tell what words can replace another and when. In the simplest case these rules can just be a list of synonyms as mentioned above, and saying that they can always be replaced in any context. To use this approach broadly requires rules for all words we will encounter and a good encoding of context to judge when any rule can be applied. Another view in solving this problem is that it is related to word sense disambiguation. If we knew the correct sense of the target word in the given context then we can deduce (from some sense repository such as WordNet) whether the new word can have the same or a related sense in this context. Knowing the correct sense is a hard problem in itself and even if we could solve that there is no perfect sense repository that can tell us all other senses and words that can replace a given sense. For our approach we decided to not rely on any static sense representation and to have an architecture that can be applied to all words, even those unseen in training.

We phrase this problem in terms of binary classification: Given a word V in a sentence S, and a second word U, can U replace V in S and keep the same or entailed meaning. This is a single yes/no question that can be applied to all S,V,U. By using a feature representation that captures the similarity and differences between the usage of U and V in some large body of text and how these contexts compare to S, we can build a single binary classifier for all U and V. Since the feature representation for U and V is dependent on some shared features of both words (such as contexts both appear in) and not the words themselves, we can apply the classifier and generalize to unseen word pairs.