The SNoW POS tagger

It is difficult to manually write down rules for the POS tag of a word in its context. We use learning techniques, based on the SNoW learning architecture, to generate those "rules". To do that, our system reads many correctly tagged sentences, used as training data, and learns a function that can be used to POS tag any English sentence.

SNoW is a learning architecture that is tailored for learning in the presence of a very large number of information sources (features). SNoW learns a network of linear functions. For the POS tagger, each target node in this network corresponds to a distinct part of speech. Each part of speech is represented as a function of the words in the sentence and the pos of words in the neighborhood of the target word.

The POS tagger makes use of the Sequential Model. This is a model that facilitates the learning and evaluation of the learned function in cases where the number of potential targets for each decision is large (in this case, there are about 50 different potential POS tags).

The current system has been trained on a collection of articles from the Wall Street Journal, consisting of about 1 million words, that were tagged for pos by the Penn Treebank project.


[ See an online demonstration | Learn more about SNoW | Other relevant papers | Download our POS tagger | Back to the POS Overview page | Back to the Main Page ]