Cognitive Computation Group Demos

Most of the information available today is in free form text. Current technologies (google, yahoo) allow us to access text only via key-word search.

We would like to enable content based access to information. Examples include:

  • Topical and Functional categorization of documents: Find documents that deal with stem cell research, but only Call for Proposals.
  • Semantic categorization: Find documents about Columbus (the City, not the Person).
  • Retrieval of concepts and entities rather than strings in text: Find documents about JFK, the president; include those documents that mention him as "John F. Kennedy, John Kennedy, Congressman Kennedy or any other possible writing; but not those that mention the baseball player John Kennedy, nor any of JFK's relatives.
  • Extraction of information based on semantic categorization: Find a list of all companies that participated in merges in the last year. List all professors in Illinois that do research in Machine Learning.

Achieving these tasks requires that we develop programs that can, at some level, understand natural language. The collection of demos below shows some of the technologies we are developing in order to address these and related questions. Some address direct Information Extraction tasks, and some exhibit fundamental natural language technologies that we are developing in order to support better access to information. These demonstrations build on our research in Machine Learning - the fundamental research area that allows us to write programs that learn from their experience, and thus support closer to human capabilities of natural language.

Comma Resolution

[Details]  [Run Demo]

Commas can define various semantic relations between elements of a sentence. These relations are often implicit -- that is, they are not expressed by means of a verb in the sentence. Comma resolution is the task of identifying these relations and extracting them.

This demo shows our comma resolution system in action. It accepts sentences as user input and decomposes them into smaller ones using cues from the structure of the sentence. In doing so, it makes implicit relations (expressed by the commas) explicit.

 

Context Senstive Verb Paraphrasing

[Details]  [Run Demo]

Lexical paraphrasing (replacing one word with another) is an inherently context sensitive problem because a word's meaning depends on context. Most paraphrasing work finds patterns and templates that can replace other patterns or templates in some context, but we are attempting to make decisions for a specific context. We have developed a global classifier that takes a verb v and its context (sentence that v appears in, along with a candidate verb u, and determines whether u can replace v in the given sentence while maintaining the original meaning. The classifier makes its decision by finding other contexts that both v and u appear in, and seeing how similar these are to the given context of v. We train the classifier without supervision by utilizing a large set of local classifiers each trained to locate paraphrases of a single word. These local classifiers then generate labeled data for the global classifier.

Context-Sensitive Spelling Correction

[Details]  [Run Demo]

Standard errors resulting in valid words can not be caught by a standard dictionary spell checker, and account for some 25% of all spelling errors.

Examples include: "please feel this form"; "I'd like a peace of cake" etc. Context sensitive spelling correction has been shown to be extremely effective in learning to correct these errors, performing with an accuracy level greater than 95%. This demo allows used to input text as if they are using their own editor. The program will then suggest corrections for any errors it finds.

 

Coreference Resolution

[Details]  [Run Demo]

A given entity - representing a person, a location, or an organization - may be mentioned in text in multiple, ambiguous ways. Understanding natural language and supporting intelligent access to textual information requires identifying whether different entity mentions are actually referencing the same entity. The Coreference Resolution Demo processes unannotated text, detecting mentions of entities and showing which mentions are coreferential.

Dataless Classification

[Details]  [Run Demo]

Dataless Classification is a learning protocol that uses world knowledge to induce classifiers without the need for any labeled data. Like humans, a dataless classifier interprets a string of words as a set of semantic concepts.

This demo shows this idea in action, allowing the user to enter arbitrary text and class labels. Without any training, the text is classified into the labels.

 

Dependency Parsing

[Details]  [Run Demo]

Dependency trees provide a syntactic representation that encodes functional relationships between words. They give us a lot of valuable information for analyzing the sentences. We develop a framework for dependency parsing by making decisions in the pipeline model based on the bottom-up parsing algorithm.

Information Extraction

[Details]  [Run Demo]

Useful and important information can be extracted from lots of unorganized documents such as news articles and emails, and stored in databases. Then, it is relatively easy to get answers to the type of structured queries that ordinary search engines do not support. We demonstrate the technology by showing its ability to extracts specific phrases of interest in two types of documents --- seminar announcements and job postings.

 

Multi-view Text Passage Comparison

[Details]  [Run Demo]

Uses a range of metrics to compare two text spans, presenting a visual mapping.

Multilingual Named Entity Discovery

[Details]  [Run Demo]

A basic sub-task of many natural language processing problems is the identification of words or phrases of specific types (e.g. locations, people, and organizations) in text, and is commonly called Named Entity Recognition (NER). Most successful approaches to NER require large amounts of text with Named Entities tagged by a human annotator. However, in many (especially less common) languages such resources do not exist. We demonstrate a method to automatically generate such resources from multilingual corpora (such as multilingual news streams).

 

Name Identification and Tracing

[Details]  [Run Demo]

Understanding natural language and supporting intelligent access to textual information require identifying whether different mentions of a name, within and across documents, represents the same entity. We demonstrate a browsing tool that incorporates some of our newly developed Machine Learning based technologies in this area. It enables users to trace different mentions of the same entity, presented in different textual forms, across documents.

Named Entity Recognition

[Details]  [Run Demo]

Named entity recognition refers to the task of identifying what phrases in text represent names of People, what represent names of Locations, Organizations, etc. This is a fundamental task in information extraction since it allows some level of abstraction that is required to support the level of interaction people are comfortable with. This is a context sensitive task, as is shown in: Jakob Washington left to Denver to meet with John Denver who works for Washington Mutual.

 

Number Quantization

[Details]  [Run Demo]

Number Quantization refers to the task of recognizing the values of numbers written in text. This tool recognizes numerical entities whether they are written as words or numerals, and can support comparison of commensurate numerical types (e.g. dates).

Part of Speech Tagging

[Details]  [Run Demo]

The importance of assigning each word in a sentence the part of speech (POS) that it assumes in that sentence stems from the fact that identifying POS is one of the early stages in the process performed by various natural language related processes such as speech recognition, translation, and information retrieval and extraction. See how it's done!

 

Relation Identification

[Details]  [Run Demo]

We demonstrate a novel and robust approach for the problem of identifying relations between pairs of concepts. We focus on identifying relations that are essential to supporting textual inference: determining whether two concepts hold the ancestor relation or whether they are siblings. Our method makes use of Wikipedia as a main source for background knowledge.

Semantic Role Labeling

[Details]  [Run Demo]

Beyond the syntactical analysis of natural language sentences is the extraction of its semantic information. Semantic role labeling is one of such task which identifies the verb and argument structure in natural language sentences, and is an important task toward natural language understanding.

 

Shallow Parsing

[Details]  [Run Demo]

Enabling a machine to respond to natural language input demands that the machine is equipped with the capacity to identify syntactical phrases in sentences. It is virtually impossible to manually write a comprehensive set of rules the accurately defines the appropriate solutioin to every task of the this nature. However, the availability of annotated corpora (collections of text) and robust machine learning techniques make it possible to emply machines to learn this task from training examples.

Text Analysis

[Details]  [Run Demo]

This analysis tool annotates different syntactic and semantic information, including syntactic parse trees, named entities, semantic roles and nominal relations on raw text.

 

Textual Entailment

[Details]  [Run Demo]

It is not hard for a human to know that a sentence "Joe Smith offers a generous gift to the university." also means "Joe Smith contributes to academia.". But it is extremely hard for a machine. Being able to tackle this task will be an important step toward natural language understanding. This demonstration presents a system that aims to tackle this problem.

Word Similarity

[Details]  [Run Demo]

A word similarity metric using WordNet and other resources.

 
Are the servers running?
Demo usage statistics