Constraint Driven Learning

Details:

Decision making in domains such as natural language processing is characterized by ambiguity and partial or imperfect information sources, which necessitates the use of models learned from data. Decisions made with these models often involve assigning values to sets of interdependent variables where the expressive dependency structure among variables of interest can influence, or even dictate, what assignments are possible. To cope with these difficulties, problems are typically modeled as stochastic processes involving both output variables and the information sources, often referred to as input or observed variables.

There exist several fundamentally different approaches to learning models that can assign values simultaneously to several interdependent variables. One approach is to completely ignore the output structure at the learning stage (by learning local models that make independent local decisions), while enforcing coherent assignments at the inference stage. Another solution is to, directly or indirectly, model the dependencies among the output variables in the learning process and thus induce models that optimize a global performance measure. In this scenario, to allow efficient training and inference, the model of the joint distribution is factored into functions of subsets of the variables, yielding models such as Markov Random Fields (MRFs), Conditional Random Fields (CRFs) and Hidden Markov Models (HMMs).

However, in many problems dependencies among output variables have non-local nature, and incorporating them into the model as if they were probabilistic phenomena can undo a great deal of the benefit gained by factorization, as well as make the model more difficult to design and understand. For example, consider an information extraction task where two particular types of entities cannot appear together in the same document. Modeling mutual exclusion in the scenario where $n$ random variables can be assigned mutually exclusive values introduces $n^2$ pairwise edges in the graphical model, with obvious impact on training and inference. While efficient algorithms for leveraging a particular type of constraint can be developed, modeling of declarative non-local constraints this way is clearly very expensive. Moreover, a lot of parameters are being wasted in order to to learn something the model designer already knows.

In this work, we provide a framework that augments linear models with declarative constraints as a way to support decisions in an expressive output space. Our framework injects the constraints directly instead of doing it indirectly as mentioned above. it allows the use of expressive constraints while keeping models simple and easy to understand. Factoring the models by separating declarative constraints naturally brings up interesting questions and calls for novel training and inference algorithms. Our results show that the constraints play an important role under both supervised and semi-supervised learning setting.