Papers

Papers that are recommended for presentation are denoted by ♥

Introduction (Lecture 1)

General: ML in NLP

The following two papers are here for historical reasons. These are survey papers that describe the state of the art in Machine Learning for NLP in 1999 and 2005.

C. Cardie and R. Mooney, Guest Editors’ Introduction: Machine Learning and Natural Language Processing. Machine Learning Journal. Special Issue on Natural Language Learning 1999
P. Fung and and D. Roth, Guest Editors’ Introduction: Machine Learning in Speech and Language Technologies. Machine Learning Journal, Special Issue on Natural Language Learning 2005

Generative and Discriminative Models

A. Ng and M. Jordan On Discriminative vs. Generative Classifiers. A comparison of Logistics Regression and naive Bayes NIPS 2002
D. Roth Learning to Resolve Natural Language Ambiguities: A Unified Approach AAAI 1998
D. Roth Learning in Natural Language IJCAI 1999

Multiclass

S. Har-Peled, D. Roth and D. Zimak, Constraint Classification for Multiclass Classification and Ranking NIPS 2003
Y. Crammer and T. Singer, Ultraconservative Online Algorithms for Multiclass Problems JMLR 2003
Y. Even-Zohar and D. Roth, A Sequential Model for Multi Class Classification EMNLP 2001
X. Li and D. Roth, X. Lin and D. Roth, Learning Questions Classifiers: The Role of Semantic Information NLE 2005
M. Gupta, S. Bengio and J. Weston, Training Highly Multiclass Classifiers JMLR 2014

Basic Structured Models: Sequential Models

BONUS: To learn how to efficiently implement averaged perceptron (without storing weight vectors), refer Fig 2.3 on page 19 in Hal Daume’s thesis.

SVM

C. Burges A Tutorial on Support Vector Machines for Pattern Recognition, 1998
♥ B. Taskar, C. Guestrin and D. Koller Max-Margin Markov Networks NIPS 2003
♥ I. Tsochantaridis, T. Hofman, T. Joachims, Y. Altun Large Margin Methods for Structured and Interdependent Output Variables JMLR 2005

Constrained Conditional Models

Constraint-based Models

D. Roth and W. Yih, A Linear Programming Formulation for Global Inference in Natural Language Tasks. CoNLL 2004
♥ D. Roth and W. Yih Global Inference for Entity and Relation Identification via a Linear Programming Formulation. Introduction to Statistical Relational Learning, 2007
M. Richardson and P. Domingos, Markov Logic Networks Machine Learning Journal 2006

BONUS: To learn how to convert boolean constraints to ILP constraints, refer,

W. Yih Global Inference Using Integer Linear Programming Technical Report 2004.
Applications
♥ J. Clarke and M. Lapata Constraint-Based Sentence Compression: An Integer Programming Approach COLING/SCL 2006
♥ S. Riedel and J. Clarke, Incremental Integer Linear Programming for Non-projective Dependency Parsing EMNLP 2006
♥ J. Clarke and M. Lapata Global Inference for Sentence Compression: An Integer Linear Programming Approach JAIR 2008
♥ A. F. T. Martins, N. A. Smith, and E. P. Xing, Concise Integer Linear Programming Formulations for Dependency Parsing ACL 2009
♥ Y. Choi and C. Cardie, Adapting a Polarity Lexicon Using Integer Linear Programming for Domain-Specific Sentiment Classification EMNLP 2009
♥ X. Cheng and D. Roth, Relational Inference for Wikification EMNLP 2013.

Training Paradigms

Training Paradigms: Constraint-based Models

♥ V. Punyakanok, D. Roth, W. Yih, and D. Zimak Learning and Inference over Constrained Output IJCAI 2005
♥ D. Roth, W. Yih Integer Linear Programming Inference for Conditional Random Fields ICML 2005.

Distributed Output Representations

V. Srikumar and C. Manning Learning Distributed Representations for Structured Output Prediction. NIPS 2014

Applications

♥ B. Taskar, D. Klein, M. Collins, D. Koller and C. Manning. Max-Margin Parsing EMNLP 2004
♥ M. Collins Discriminative Reranking for Natural Language Parsing ICML 2000
♥ R. Johansson and P. Nugues Dependency-based Semantic Role Labeling of PropBank. EMNLP 2008
♥ V. Punyakanok, D. Roth and W. Yih, The Importance of Syntactic Parsing and Inference in Semantic Role Labeling Computational Linguistics 2008.
♥ Y. Yang and M-W. Chang, S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking, ACL 2015.
♥ K.-W. Chang and R. Samdani and D. Roth, A Constrained Latent Variable Model for Coreference Resolution, EMNLP 2013.

Unsupervised Learning and Indirect Supervision

Constraint-Driven Learning

M. Chang, L. Ratinov, N. Rizzolo and D. Roth, Learning and Inference with Constraints AAAI 2008.
♥ M. Chang, L. Ratinov, and D. Roth, Guiding Semi-Supervision with Constraint-Driven Learning ACL 2007.
♥ K. Ganchev, J. Graca, J. Gillenwater and B. Taskar, Posterior Regularization for Structured Latent Variable Models JMLR 2010.
♥ K. Hall, R. McDonald, J. Katz-Brown and M. Ringgaard, Training dependency parsers by jointly optimizing multiple objectives EMNLP 2011.

Latent Variables

♥ M. Chang, D. Goldwasser, D. Roth and V. Srikumar, Discriminative Learning over Constrained Latent Representations NAACL 2010.
♥ Chun-Nam John Yu and T. Joachims, Learning Structural SVMs with Latent Variables ICML 2009.
A. McCallum, K. Bellare and F. Pereira, A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance UAI, 2005.
♥ Sun, Xu, T. Matsuzaki, D. Okanohara and J. Tsujii, Latent Variable Perceptron Algorithm for Structured Classification IJCAI 2009.
Matsuzaki, Miyao, Tsujii Probabilistic CFG with Latent Annotations ACL 2005
♥ Collobert and Weston A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning.
S. Petrov, L. Barrett, R. Thibaux and D. Klein, COLING/ACL 2006 Learning Accurate, Compact, and Interpretable Tree Annotation
P. Liang, S. Petrov, M. Jordan, and D. Klein, EMNLP 2007 The Infinite PCFG using Hierarchical Dirichlet Processes

Indirect Supervision

♥ M. Chang, V. Srikumar, D. Goldwasser and D. Roth, Structured Output Learning with Indirect Supervision ICML 2010.
♥ Noah A. Smith and Jason Eisner, Contrastive Estimation: Training Log-Linear Models on Unlabeled Data ACL 2005.
♥ G.S. Mann and A. McCallum, Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data JMLR 2010.

Inference

♥ T. Finley, T. Joachims, Training Structural SVMs when Exact Inference is Intractable ICML 2008.
♥ C. Sutton and A. McCallum Piecewise Pseudolikelihood for Efficient Training of Conditional Random Fields ICML 2007
♥ T. Joachims, T. Finley, Chun-Nam Yu, Cutting-Plane Training of Structural SVMs Machine Learning 2009.
♥ T. Koo, A. M. Rush, M. Collins, T. Jaakkola, and D. Sontag, Dual Decomposition for Parsing with Non-Projective Head Automata. EMNLP 2010.
♥ V. Srikumar, G. Kundu and D. Roth On Amortizing Inference Cost for Structured Prediction EMNLP 2012.

Search Based Inference

♥ H. Daume, J. Langford, and D. Marcu, Search-based Structured Prediction Machine Learning 2009
J.R. Doppa, A. Fern and P. Tadepalli, HC-Search: A Learning Framework for Search-based Structured Prediction JAIR 2014
K.-W. Chang, A. Krishnamurthy, A. Agarwal, H. Daumé III, J. Langford, Learning to Search Better Than Your Teacher ICML 2015
T. Vieira and J. Eisner, Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing TACL 2017

Deep Learning

Y. Goldberg A Primer on Neural Network Models for Natural Language Processing. JAIR 2016.

Applications

Richard Socher, John Bauer, Christopher D. Manning, and Andrew Y. Ng, Parsing With Compositional Vector Grammars. ACL 2013.
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. NIPS 2014.
♥ S. Wiseman and A. M. Rush. Sequence-to-sequence learning as beam-search optimization. EMNLP 2016.
♥ A. Karpathy, A. Joulin, and F. F. Li. Deep fragment embeddings for bidirectional image sentence mapping NIPS, 2014.
♥ L. Kong, C. Dyer, N. A. Smith Segmental Recurrent Neural Networks ICLR 2016.
♥ L. Yu, P. Blunsom, C. Dyer, E. Grefenstette, T. Kocisky The Neural Noisy Channel ICLR 2017.
♥ Y. Kim, C. Denton, L. Hoang, A. M. Rush Structured Attention Networks ICLR 2017.
♥ E. Kiperwasser, Y. Goldberg Easy-First Dependency Parsing with Hierarchical Tree LSTMs TACL 2016.

Papers that are recommended for presentation are denoted by ♥

Introduction (Lecture 1)

General: ML in NLP

Generative and Discriminative Models

Multiclass

Basic Structured Models: Sequential Models

Background

Inference with Classifiers

CRF

Structured Perceptron

SVM

Constrained Conditional Models

Constraint-based Models

Applications

Training Paradigms

Training Paradigms: Constraint-based Models

Distributed Output Representations

Applications

Unsupervised Learning and Indirect Supervision

Constraint-Driven Learning

Latent Variables

Indirect Supervision

Inference

Inference

Search Based Inference

Deep Learning

Applications