Name Identification and Tracing  

[Run Demo]

One major difficulty in processing natural language text is that the same concept can be written in many different ways and sometimes, the same writing can mean different things. We call this problem: Robust Reading problem.

Robust Reading deals with this problem in the context of Names. John F. Kennedy may be referred to within the same document as John F. Kennedy, Kennedy, John Fitzgerald Kennedy and more. It can be referred to in other documents as JFK, John Kennedy and Kennedy, but some of these names, both within the same document and in others, could refer to other people. Similar problems exists for other names - of locations, organization, etc.

This is a significant problem in all applications that interact via natural language and access unstructured information. Resolving this problem goes beyond transformation of strings and requires quite often context sensitive inferences.

We develop an approach to this problem that relies on

  • Identifying Named Entities and classifying them to names of People, Locations, Organizations, etc.
  • Tracing different writings of identical concepts in single paragraphs and document, and
  • Clustering occurrences of concepts, even when written in different ways, across many documents.

The approach requires the use of several machine learning based context sensitive classifiers, clustering algorithms and global inference.