Machine Learning and Natural Language
Spring 2016
Course Plan and Lecture Notes
Note: Topics, Lecture Notes, Relevant Papers and Presentations will be made available and will be updated throughout the semester.
Papers that are recommended for presentation are denoted by
P
along with their category
I. Introduction
Introduction to the Class
[PPT]
[PDF]
(01/20)
NLP Problems; Key Approaches
Models of Classification and Multiclass Classification (01/27)
Discriminative Models of Classification; Review-I:
[PPT]
[PDF]
Discriminative Models of Classification; Review-II:
[PPT]
[PDF]
Multiclass and Constraint Classification:
[PPT]
[PDF]
Reading:
General: ML in NLP
L. Marquez,
Machine Learning and Natural Language Processing
C. Cardie and R. Mooney,
, Guest Editors' Introduction: Machine Learning and Natural Language Processing. Machine Learning Journal. Special Issue on Natural Language Learning. 34(1/2/3), 1999
P. Fung and and D. Roth,
, Guest Editors' Introduction: Machine Learning in Speech and Language Technologies.
Machine Learning Journal, Special Issue on Natural Language Learning. 60 (1/2/3), 2005
Generative and Discriminative Models
A. Ng and M. Jordan,
On Discriminative vs. Generative Classifiers. A comparison of Logistics Regression and naive Bayes.
The Conference on Advances in Neural Information Processing Systems (NIPS) 2002
D. Roth, (1998)
Learning to Resolve Natural Language Ambiguities: A Unified Approach
AAAI 1998
D. Roth
Learning in Natural Language
IJCAI'99
Multiclass
S. Har-Peled, D. Roth and D. Zimak,
Constraint Classification for Multiclass Classification and Ranking
, The Conference on Advances in Neural Information Processing Systems (NIPS) 2003
Y. Crammer and T. Singer,
Ultraconservative Online Algorithms for Multiclass Problems
, JMLR 2003
Y. Even-Zohar and D. Roth,
A Sequential Model for Multi Class Classification
X. Li and D. Roth,
X. Lin and D. Roth, ``Learning Questions Classifiers: The Role of Semantic Information. Natural Language Engineering
M. Gupta, S. Bengio and J. Weston,
``Training Highly Multiclass Classifiers'', JMLR 15 (2014)
II. Basic Structured Models: Sequential Models
Sequence Labeling Problems (2/3, 2/10)
Introduction to Structures
[PPT]
[PDF]
Models of Sequences
[PPT]
[PDF]
Additional slides on sequential models (HMM, MeMM, PMM)
[PPT]
[PDF]
(02/18, 02/23, 02/25)
HMMs and CRFs
Inference with Classifiers I
Structured Perceptron
Structured SVMs
Reading:
Background
Chapter 9-10 Manning and Schutze
Optional background on general Graphical Models
[PDF]
Bengio,
Markovian Models for Sequential Data
Rabiner, L. R., A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, 1989, vol. 77, no. 2.
Inference with Classifiers
P=Const
V. Punyakanok and D. Roth,
The Use of Classifiers in Sequential Inference
, The Conference on Advances in Neural Information Processing Systems (NIPS) 2001
Andrew McCallum, Dayne Freitag, and Fernando Pereira,
Maximum entropy Markov models for information extraction and segmentation
, ICML, 2000.
CRF
P=CRF
J. Lafferty, A. McCallum, F.Pereira
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML'02
C. Sutton and A. McCallum
Introduction to Conditional Random Fields for Relational Learning
In Statistical Relational Learning, 2007
P=CRF
André F. T. Martins, Noah A. Smith, Pedro M. Q. Aguiar, and Mário A. T. Figueiredo
Structured Sparsity in Structured Prediction
EMNLP 2011
Perceptron
P=Perc
M. Collins,
Discriminative Training for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms
EMNLP'02.
P
Hal Daume, J. Langford, and Daniel Marcu,
Search-based Structured Prediction
Machine Learning 2009
J.R. Doppa, A. Fern and P. Tadepalli,
"HC-Search: A Learning Framework for Search-based Structured Prediction"
JAIR 2014
SVM
C. Burges
A Tutorial on Support Vector Machines for Pattern Recognition
, 1998
P=SVM
B. Taskar, C. Guestrin and D. Koller
Max-Margin Markov Networks
NIPS 2003
P=SVM
I. Tsochantaridis, T. Hofman, T. Joachims, Y. Altun,
Large Margin Methods for Structured and Interdependent Output Variables
JMLR 2005
III. Constrained Conditional Models
Pipeline Models
Integer Linear Programming
Introducing Background knowledge
Reading:
Constrains based Models
D. Roth and W. Yih
A Linear Programming Formulation for Global Inference in Natural Language Tasks.
CoNLL'04
P=Const
D. Roth and W. Yih
, Global Inference for Entity and Relation Identification via a Linear Programming Formulation.
Introduction to Statistical Relational Learning, 2007
M. Richardson and P. Domingos,
Markov Logic Networks
Machine Learning Journal 2006
Applications
P=Const
James Clarke and Mirella Lapata
Constraint-Based Sentence Compression: An Integer Programming Approach
COLING/SCL 2006
P=Const
Sebastian Riedel and James Clarke,
Incremental Integer Linear Programming for Non-projective Dependency Parsing
EMNLP 2006
Pascal Denis and Jason Baldridge,
Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming
NAACL 2007
P=Const
André F. T. Martins, Noah A. Smith, and Eric P. Xing,
Concise Integer Linear Programming Formulations for Dependency Parsing
ACL 2009
P=Const
Yejin Choi and Claire Cardie,
Adapting a Polarity Lexicon Using Integer Linear Programming for Domain-Specific Sentiment Classification
EMNLP
P=Const
X. Cheng and D. Roth,
Relational Inference for Wikification,
EMNLP 2013.
IV. Training Paradigms
Decoupling Learning from Inference (L+I)
Inference based Training (Joint Learning, IBT)
Online and Batch Joint Learning
Reading:
Training Paradigms: Constrains based Models
P=Const
V. Punyakanok, D. Roth, W. Yih, and D. Zimak
Learning and Inference over Constrained Output
IJCAI'05.
P=Const, CRF
D. Roth, W. Yih
Integer Linear Programming Inference for Conditional Random Fields
ICML'05.
Distributed Output Representations
V. Srikumar and C. Manning
Learning Distributed Representations for Structured Output Prediction.
NIPS'14
Applications
P=SVM
B. Taskar, D. Klein, M. Collins, D. Koller and C. Manning. (EMNLP04)
Max-Margin Parsing
P=Perc
M. Collins.(ICML 2000)
Discriminative Reranking for Natural Language Parsing
Richard Johansson and Pierre Nugues. (EMNLP '08)
Dependency-based Semantic Role Labeling of PropBank.
P=Const
V. Punyakanok, D. Roth and W. Yih,
The Importance of Syntactic Parsing and Inference in Semantic Role Labeling
Computational Linguistics 2008.
P=Const
Y. Yang and M-W. Chang,
S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking,
ACL 2015.
P=Const
K.-W. Chang and R. Samdani and D. Roth,
A Constrained Latent Variable Model for Coreference Resolution,
EMNLP 2013.
V. Unsupervised Learning and Indirect Supervision
Constraints Driven Learning and Posterior Regularization
Learning with latent variables
Indirect Supervision
Reading:
Constraints Driven Learning
M. Chang, L. Ratinov, N. Rizzolo and D. Roth,
Learning and Inference with Constraints
AAAI 2008.
P=Const, CRF
M. Chang, L. Ratinov, and D. Roth,
Guiding Semi-Supervision with Constraint-Driven Learning
ACL 2007.
P=Const
K. Ganchev, J. Graca, J. Gillenwater and B. Taskar,
Posterior Regularization for Structured Latent Variable Models
JMLR 2010.
P
K. Hall, R. McDonald, J. Katz-Brown and M. Ringgaard,
Training dependency parsers by jointly optimizing multiple objectives
EMNLP 2011.
Latent Variables
P=Const, SVM
M. Chang, D. Goldwasser, D. Roth and V. Srikumar,
Discriminative Learning over Constrained Latent Representations
NAACL 2010.
P=SVM
Chun-Nam John Yu and T. Joachims,
Learning Structural SVMs with Latent Variables
ICML, 2009.
Andrew McCallum, Kedar Bellare and Fernando Peraira,
A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance
UAI, 2005.
P=Perc
Sun, Xu, Takuya Matsuzaki, Daisuke Okanohara and Jun'ichi Tsujii,
Latent Variable Perceptron Algorithm for Structured Classification
IJCAI, 2009.
Matsuzaki, Miyao, Tsujii
Probabilistic CFG with Latent Annotations
ACL 2005
P=NN
Collobert and Weston
A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning.
Slav Petrov, Leon Barrett, Romain Thibaux and Dan Klein, COLING/ACL 2006
"Learning Accurate, Compact, and Interpretable Tree Annotation
Percy Liang, Slav Petrov, Michael Jordan, and Dan Klein, EMNLP 07
The Infinite PCFG using Hierarchical Dirichlet Processes
Indirect Supervision
P=SVM, Const
M. Chang, V. Srikumar, D. Goldwasser and D. Roth,
Structured Output Learning with Indirect Supervision
ICML 2010.
P=CRF
Noah A. Smith and Jason Eisner,
Contrastive Estimation: Training Log-Linear Models on Unlabeled Data
ACL 2005.
P=CRF
G.S. Mann and A. McCallum,
Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data
JMLR 2010.
.
VI. Inference
Approximate Inference
Dual Decomposition
Reading:
Inference
P=SVM
T. Finley, T. Joachims,
Training Structural SVMs when Exact Inference is Intractable
ICML, 2008.
P=CRF
C. Sutton and A. McCallum
Piecewise Pseudolikelihood for Efficient Training of Conditional Random Fields
ICML 2007
P=SVM
T. Joachims, T. Finley, Chun-Nam Yu,
Cutting-Plane Training of Structural SVMs
Machine Learning, 2009.
P=Const
Terry Koo, Alexander M. Rush, Michael Collins, Tommi Jaakkola, and David Sontag,
Dual Decomposition for Parsing with Non-Projective Head Automata.
EMNLP, 2010.
Group Presentations Schedule (Tentative: order of the groups is pretty firm)
Tentative Presentation Schedule (google doc)
Features
Exp Models
Perceptron
Structured SVM
Constrained Conditional Models (CCMs)
Supervision Protocols
Deep Learning (NN)
Optimization
Inference
Latent Representations