Machine Learning and Natural Language

Fall 2008

Experimental Assignment II                    Textual Entailment (Due 4/25/08 and 2/5/08)

General

Textual Entailment

Textual Entailment is the task of determining, for example, that the sentence: ``WalMart defended itself in court today against claims that its female employees were kept out of jobs in management because they are women'' entails that `` WalMart was sued for sexual discrimination''. Determining whether the meaning of a given text snippet entails that of another or whether they have the same meaning is a fundamental problem in natural language understanding that requires the ability to abstract over the inherent syntactic and semantic variability in natural language. This challenge is at the heart of many high level natural language processing tasks including Question Answering, Information Retrieval and Extraction, Machine Translation, and others that attempt to reason about and capture the meaning of linguistic expressions.

This task has been defined only recently, but it's a pretty hot area of research.

Recommended Reading on Textual Entailment:

The goal of this assignment is for you to think about and implement a strategy for deciding Textual Entailment. Given a pair of sentences, t and h, you want to determine if t entails h. The goal is to return True (entails) or False, along with a confidence in this decision.


The Assignment

You will be given two sets of collections of pairs of (t,h)). The first set consists of development sets for the three RTE challanges; you can use it to develop your strategy; look at the data, study it in different ways, train classifiers on it if you'd like, etc.

The second collection of pairs, the test set, consists of the test sets of all three RTE challenges. Your goal is to achieve good results on the test sets. But, you will evaluate and report your results both on development and test, and report separately on each of the RTE collections.

Both collections are annotated with a task and with the entailment classification (True/False); needless to say, the annotation of the test set will be used only for evaluating your results. Data The three development sets and the three tests sets are available here. Please search for RTE and you will find development and test data for RTE{1,2,3}
In addition to the raw sentence pairs (in an xml files) the data has been processed by a semantic role labeling program, and a few other tools, and is available in a "column format". This file explains the column format. An evaluation script is available here.
Please note that this was processed with an earlier version of the semantic role labeling; you can process data yourself, via this tool. or any other tools.

Some Tools

Deliverables

Report

  1. Describe what you did, the specifics of your resources, algorithms and experiments.
  2. Conclude with some suggestions for improvements, future work, etc.
  3. Send me only your report (no longer than 10 pages; 11 font; pdf file) but be ready with a package of the code in case you need to show something about it.

Grading

Your grade depends on:
  1. The quality of your report
  2. The quality of your results.
  3. Your originality in going beyond the minimal requirements.
  4. The quality of your final presentaiton.

Due date

Friday, May. 2nd. Presentation on May 5th.
Dan Roth