
Some of the resources below are copyrighted (marked with a '+') and require a password to access.
To use these resources, you must be a member of the LDC or have purchased
a license for the relevant resource. If you are a student at the University of Illinois at Urbana-Champaign and
need to use these resources, you may be covered by an existing license; check with your professor and then email
us to get access.
IT IS ILLEGAL TO SHARE A COPYRIGHTED LDC RESOURCE WITH PEOPLE OR ORGANIZATIONS WHO DO NOT HAVE EITHER AN LDC MEMBERSHIP OR A LICENSE
TO USE THE COPYRIGHTED RESOURCE.
For large corpora, the links below take you to a directory structure; to download the data, you can use wget.
The entailment corpora from the three PASCAL Recognizing Textual Entailment challenges are provided here in a column format that encodes a range of annotation of the original text. The original corpora can be accessed from the NIST Text Analysis Conference RTE track web site. corpus. This file explains the column format. The PARC sentence pairs were provided separately by Xerox PARC.
For a more extensive set of examples testing the kinds of phenomena modeled in the PARC dataset, take a look at the FRACAS dataset provided by Bill MacCartney of the Stanford University NLP Group.
The corpus and annotation guidelines developed for (V. Srikumar, R. Roichert, M. Sammons, A. Rappoport, and D. Roth, "Extraction of Entailed Semantic Relations Through Syntax-Based Comma Resolution", Proc. of the Annual Meeting of the ACL (2008)) can be downloaded for research use via the link below.
comma resolution data.If you use this data, please cite the work referenced above.
UIUC Image Database for Car Detection.
This data was used in the research described in the paper, "Learning to Detect Objects in Images via a Sparse, Part-Based Representation". The software used in this research can also be downloaded here. If you use this data or the code provided, please cite the above work.
If you are UIUC faculty or working on a UIUC-supported project, you can access a number of copyright-protected corpora that are used by UIUC researchers. You can obtain the username and password by emailing mssammon curly-a illinois period edu.
Note that these will soon be merged into the shared corpora directory linked to above.
Information about other corpora that may be available to students/faculty working at UIUC
can be found
here.