Learning Based Programming

Supported by NSF

Period: 2006-2009

A significant amount of the software written today interacts with naturally occurring (sensor) data such as text, speech, images and video, streams of financial data, and biological sequences. Frequently there exists a need to reason with respect to concepts that are complex and often difficult to define explicitly in terms of the raw data observed. Examples include determining the gender of a person in an image; determining the topic of an article; determining the role of a noun phrase in a sentence; determining whether more than three people are currently meeting in someone’s office; or scheduling a computation in a grid in a way that adapts to a multitude of properties of the resources and links. Applications that require such abilities are expected to rapidly grow even more important in future years.

While conventional programming languages rely on a programmer to explicitly define all the concepts and relations involved, programming with naturally occurring data that is highly variable and ambiguous at the measurement level necessitates a programming model in which some of the variables, concepts, and relations may not be known at programming time, may be defined only in a data-driven way, or may not be unambiguously defined without relying on other concepts acquired this way. It must be possible to reason with respect to variables that do not depend on tight assumptions on the environment in which the measurements are taken, and needs to center around a semantic level interaction model made possible via components that are data-dependent and support abstractions over real-world observations. Today's programming paradigms, and the corresponding programming languages, are not conducive to that goal. Consequently, despite two decades of progress in machine learning, and a clear need for systems with significant trainable (data dependent) components, few systems today incorporate significant machine learning components, and even fewer use more than a single classifier.

In the Learning Based Programming (LBP), we explore a novel software engineering paradigm that allows a programmer seamless incorporation of trainable variables into the program and, consequently, the ability to reason using high-level concepts without the need to explicitly define them in terms of all the variables they might depend on, or the functional dependencies among them. These may instead be determined in a data-driven way, via learning operators whose details are abstracted away from the programmer. In this work, we flesh out the details of the LBP paradigm we envision, and implement an LBP language – Learning based Java (LBJ) and study it via the development of applications in two areas: ubiquitous computing and natural language processing.

Relevant Projects:

Relevant Publications: