
[ Details | Participants ]
There exist several fundamentally different approaches to learning models that can assign values simultaneously to several interdependent variables. One approach is to completely ignore the output structure at the learning stage (by learning local models that make independent local decisions), while enforcing coherent assignments at the inference stage. Another solution is to, directly or indirectly, model the dependencies among the output variables in the learning process and thus induce models that optimize a global performance measure. In this scenario, to allow efficient training and inference, the model of the joint distribution is factored into functions of subsets of the variables, yielding models such as Markov Random Fields (MRFs), Conditional Random Fields (CRFs) and Hidden Markov Models (HMMs).
However, in many problems dependencies among output variables have non-local nature, and incorporating them into the model as if they were probabilistic phenomena can undo a great deal of the benefit gained by factorization, as well as make the model more difficult to design and understand. For example, consider an information extraction task where two particular types of entities cannot appear together in the same document. Modeling mutual exclusion in the scenario where $n$ random variables can be assigned mutually exclusive values introduces $n^2$ pairwise edges in the graphical model, with obvious impact on training and inference. While efficient algorithms for leveraging a particular type of constraint can be developed, modeling of declarative non-local constraints this way is clearly very expensive. Moreover, a lot of parameters are being wasted in order to to learn something the model designer already knows.
In this work, we provide a framework that augments linear models with declarative constraints as a way to support decisions in an expressive output space. Our framework injects the constraints directly instead of doing it indirectly as mentioned above. it allows the use of expressive constraints while keeping models simple and easy to understand. Factoring the models by separating declarative constraints naturally brings up interesting questions and calls for novel training and inference algorithms. Our results show that the constraints play an important role under both supervised and semi-supervised learning setting.