Learning Based Programming

Overview:

Learning-based programs are software applications that utilize machine learning technology to interact with naturally occurring data that are highly variable and ambiguous. They lend themselves towards a computational model in which some of the variables, concepts, and relations may be defined only in a data-driven way, or may not be unambiguously defined without relying on other concepts acquired this way. Learning Based Programming (LBP) is a new programming paradigm for specifying computations under this model.

Details:

Learning based programs are software applications that utilize machine learning technology to interact with naturally occurring data that are highly variable and ambiguous. They lend themselves towards a computational model in which some of the variables, concepts, and relations may be defined only in a data driven way, or may not be unambiguously defined without relying on other concepts acquired this way. Unfortunately, neither modern programming languages nor the mathematical abstractions so cleanly utilized by machine learning researchers facilitate such a model. As a result, it is inevitable that the design of systems with multiple learning components becomes quite complex, and their efficient implementation can only be accomplished by those with expertise both in the selected learning algorithms and the application domain. Even when such expertise is available, implementations of conceptually simple learning-based programs can be time consuming and prone to error.

Learning Based Programming (LBP) is a programming paradigm that addresses these issues. In LBP, the implementation details of feature extraction, learning, and inference are abstracted away from the programmer so that he or she may focus more directly on the design of his application. To accomplish this, an LBP implementation formalizes the definitions of these concepts so that they may be integrated into a programming language, enabling the programmer to use them as building blocks. Using LBP, a programmer names classifiers and optionally provides hard-coded definitions for them. Where hard-coded definitions are not available, other classifiers are designated as information sources, and the compiler takes care of learning the desired classifiers from data. Constraints over one or more classifiers' outputs may also be imposed declaratively, and learned classifiers will automatically respect them.

Learning Based Java (LBJ) is our implementation of LBP which accepts the practitioner's classification model as input, automatically generating efficient Java code that implements the trained classifier's entire computation from raw data to output decision. LBJ is best viewed as a programming framework in which the practitioner defines a classification model as a set of classifier specifications and a set of constraints over them. A classifier may be specified by

  • coding it explicitly in Java,
  • using operators to build it from existing classifiers,
  • or identifying feature extraction classifiers and a data source to learn it over.
The LBJ compiler then generates code and trains the learning classifiers as necessary, employing inference algorithms to resolve the constraints. Programming in LBJ, the practitioner reasons in terms of his data directly, disregarding the cumbersome details of learning and inference algorithm implementation.

Relevant Software:

Funding Agencies:

Relevant Publications: