Data for Information Extraction
Seminar Announcements
- Collected and annotated by Dayne Freitag at CMU
- The original data can be found here.
- The preprocessed data can be found here.
Computer-related Jobs
- Collected and annotated by Mary Elaine Califf at University of Texas
at Austin
- The original data can be found here.
- The preprocessed data can be found here.
Format of the preprocessed data
- The data is represented in the table format, which can be used directly
by Fex.
- Each sentence is stored in a table. Sentences are separated
by an empty line in the file.
- The first column is the label of the target fragments in BIO format.
- The second column stores the named entity tags.
- The third column is the word numbers.
- Column 4 records the noun phrases (also in BIO format).
- Column 5 is part-of-speech tags.
- Column 6 is the words.
- Column 7, 8, 9 are not used.