Information Extraction

Overview:

Information extraction (IE) is the task of extracting functional information from machine-readable documents. Machine learning approaches to IE have demonstrated superior performance and are now the dominant approach. We have developed machine learning and inference technique both for frame-like information extraction, that attempts to map free from text to database with a given schema, and for the extraction of specific relations and entities from free form text.

Information Extraction aims at supporting useful information from free form text, so that it can be accessed later on as if the data is stored in a database.

Typical IE tasks include:

  1. Extracting from free-text job postings the job description, requirements, location and salary. The extracted fields can be stored in a structured database and accessed with SQL queries.
  2. Extracting from Craigslist.com postings the number of bedrooms, bathrooms, the price, the availability, and the location of a real-estate property.
  3. Extracting from email seminar announcements the speaker name, organizational affiliation, talk title, date, time, and location, and then putting this data in to one's calendar.
  4. Identifying events such as political assassination and extracting relevant information such as victim, perpetrators, location, causes etc. from the text. This can be used as input for quantitative social science research.

We have developed several methods for IE, focusing on accurate information extraction while requiring minimal annotation effort.

Relevant Publications: