The column format we use for entailment corpora encodes the following information (columns are numbered from 1 to N, 1 being the leftmost): Column 1: Parse dependency tree information: Phrasetype/Phrasehead index -- LEAF entry may be prefixed to indicate leaf node; this is not done if the leaf node constitutes a recognized phrase type. Column 2: Named Entity information: {B|I|O}-NE1/NE2... -- more than one NE may be tagged. -- BIO (Begin/Inside/Outside) format used to indicate multi-word NE boundaries Column 3: Word index (from zero) Column 4: Part of Speech Column 5: Word Column 6: Verb lemma Columns 7+: Semantic Role Label information. -- each column applies to one verb in the sentence (whose lemma appears in column 6) -- The format here is similar to the dependency tree for column 1. -- ABC/N: ABC is tag, N is sentence constituent ABC relates to; -- Arguments of verbs: head is ARG0-5, AM_XYZ; these correspond to subject, direct object, adverbial phrases etc. (see PropBank annotation details) -- index points to verb of which this word is an argument -- other argument constituents are tagged MOD_A0 etc., and their index points to the designated head of their argument. (FYI, full parse dependency info is used to identify heads of multi-word SRL constituents)