NLP Tools

The tools on this page are useful for a variety of text processing tasks, such as converting raw text to a form suitable for FEX, calling CogComp servers to tag text, etc. They are provided here as a convenience for developers and as a courtesy to users of our tools.

These tools are under development and are provided as is; they work on the systems on which they were created, but we make no guarantees that they will work on others. We also will not accept responsibility for any problems that may arise from using these tools.

Verb Tense Changer  

This tool changes the tense of a verb, e.g. from 3rd person singular to present participle.

 

HTML Tag Stripper  

This tool retrieves a page in html format and extracts the text content (by stripping the html tags).

Sentence Segmentation tool  

This sentence segmentation tool reads plain text and rewrites it with one sentence per line.

 

Word Splitter  

The word splitter is a segmentation script that reads plain text (one sentence per line) and outputs the words with spaces between every word and punctuation mark (this format is need by tools such as the POS-tagger).

FEX input preprocessor: chunks to columns  

This tool takes text output from the shallow parser (chunker) and converts it to column format.

 

Snow Statistics Summarizer  

This tool summarizes SNoW output statistics for each label in a given task.

Collins Parser / FEX Translators  

These tools convert column format data to the format required by Collins' Parser, and the output of the Collins' Parser to a column format similar to that used by FEX.

 

Preprocessor  

This tool takes plain text and adds POS, Shallow Parse and Named Entity tags.

F1 calculator (Shallow Parser)  

These scripts calculate precision, recall and F1 values for bracketed data (Shallow Parser format).

 

FEX lexicon pruner  

The lexicon pruner removes redundant entries in FEX's lexicon file.

Bibfile Sorter  

This tool sorts bibitems in a bibfile.

 

SNoW Tuner  

This script will try every combination of the parameter settings you give it, training a SNoW network and evaluating it on a test set. The parameters that gave the best performance are reported.