Multilingual Named Entity Discovery  

[Run Demo]

Many natural language processing tasks require the ability to identify entities of different types in text (e.g. locations, people, and organizations). In turn, many approaches to extract these Named Entities use machine learning techniques, which require large amounts of labeled examples. Such supervised data does not exist or is expensive to obtain for many (especially, less common) natural languages. We attempt to automatically generate such data using specific properties of resources that we can obtain cheaply (e.g. from the Web). Specifically, we search for transliterations/translations of NEs in other languages in a multilingual corpus, such as multilingual news streams.