Tutorial:
Machine Learning Tools in Natural Language Processing
This tutorial explores the use of SNoW and FEX, two of our core
machine learning tools, to solve text processing problems. Specifically,
we apply SNoW and FEX to context sensitive spelling and named entity tagging.
The tutorial also uses a number of our other standard tools (such as
our Part-of-Speech Tagger)
and custom scripts to preprocess/postprocess data for each
task. Finally, it mentions ways to streamline the process using perl and shell
scripts, and SNoW and FEX's server modes.
Tutorial Slides
Under each session heading are links to slides in powerpoint and pdf formats.
NOTE: some slides will not display correctly in pdf format.
The resources to accompany each session are provided further down the page.
-
Session 1: Text Processing and Feature Extraction
- Introduction; Preprocessing; Feature Extraction
ppt
pdf
- Multi-class Classification with SNoW
ppt
pdf
Session 2: Applying FEX and SNoW to Named Entity Tagging
- Candidate Selection and Feature Extraction: Named Entity Tagging
ppt
pdf
Session 3: Learning Based Java
-
The Basics:
ppt
pdf
-
In the resources section below, see the User's Manual and the README
in the "toy context sensitive spelling corrector" for more info.
Tutorial Resources
For each tutorial session below, you will find links to the software and data you need,
script files that detail command line usage, and (where appropriate) helper scripts.
NOTE: The text files next to the tool links assume you have downloaded and unzipped/untarred the relevant
package. They walk you through installation on the computers in the SC lab we are using for
the tutorial, and show a sample run. They are NOT executable: they are intended to give an example
for you to follow (IYI, they were generated with the unix 'script' command, hence the suffix '.script'...).
-
Session 1: Context Sensitive Spelling
Session 2: Named Entity Tagging with FEX and SNoW
-
Session 3
Miscellaneous resources
More details about SNoW, including its many useful tuning parameters and support for
inference, can be found in the comprehensive
SNoW user manual.
Fex's user manual is included in its
distribution tarball.
In addition to explaining FEX's scripting language and input
formats, this manual gives details of other specialized FEX modes.
The following resources demonstrate FEX's document mode, which is not covered in this tutorial.
If you have questions about these materials, particularly if you are attending the current tutorial sessions, please contact me at mssammon@uiuc.edu -- likewise if you find incorrectly labeled resources,
errors, broken links, etc.