Part of Speech Tagging Overview

Part of Speech (pos) tagging is the problem of assigning each word in a sentence the part of speech that it assumes in that sentence.

Since words are ambiguous in terms of their part of speech, the correct part of speech is usually identified from the context the word appears in. Consider for example the sentence "Many lights will light the play room so that the light people can play." The word "light" takes in this sentence a role of a verb, a noun and an adjective. The word "play" takes both a noun and a verb, and other words, like "will" and "can" take modal-verb, but can be also tagged, in a different context, as nouns. This leads to many possible POS tagging of the sentence, only one of which is correct.

The importance of the problem stems from the fact that identifying pos is one of the first stages in the process performed by various natural language related processes such as speech recognition, translation, information retrieval and extraction and others.

It is difficult to manually write down rules for the POS tag of a word in its context. We use learning techniques, based on the SNoW learning architecture, to generate those "rules". To do that, our system reads many correctly tagged sentences, used as training data, and learns a function that can be used to POS tag any English sentence.


[ See an online demonstration | Learn more about the SNoW based POS tagger | Download our POS tagger | Back to the Main Page ]