Spring 2017
Updated notes will be available here as ppt and pdf files after the lecture. Older lecture notes are provided before the class for students who want to consult it before the lecture. Pointers to relevant material will also be made available --
I assume you look at least at the Reading and the *-ed references.
The dates next to the lecture notes are tentative; some of the material as well as the order of the lectures may change during the semester.
-
Lecture #0: Course Introduction and
Motivation, pdf
Reading: Mitchell, Chapter 1
-
Lecture #1: Introduction to Machine
Learning, pdf
Also see: Weather - Whether Example
Reading: Mitchell, Chapter 2
-
Tutorial: Building a Classifier with Learning Based Java,
pdf,
pdf2
Walkthrough on using LBJava with examples.
-
Lecture #2: Decision Trees, pdf
Additional notes: Experimental
Evaluation
Reading: Mitchell, Chapter 3
References
-
J. Quinlan, "Induction of Decision Trees". Machine Learning, 1:81-106,
1986.
-
(*)
R. Rivest, "Learning Decision Lists". Machine Learning, 2(3):229-246,
1987.
(link)
-
J. Quinlan and R. Rivest, "Inferring Decision Trees Using the Minimum
Description Length Principle". Information and Computation,
80:227-248, 1989.
-
T. Dietterich, "Approximate Statistical Tests for Comparing Supervised
Classification Learning Algorithms", Neural Computation 10(7), 1998.
- Learning Rules + ILP
(used to be Lecture #3, will not be covered in Fall 2016)
Reading: Mitchell, Chapter 10
References
-
(*)
W. Cohen, "Fast Effective Rule Induction". ICML, 1995.
(citeseer)
-
W. Cohen and Y. Singer, "A Simple, Fast, and Effective Rule Learner".
AAAI, 1999.
(link)
-
Bratko, I. and Muggleton, S. "Applications of Inductive Logic
Programming". Commun. ACM 38, 11 (Nov. 1995), 65-70.
(acm)
- Lecture #4: On-Line Learning: Winnow, Perceptron:
P1.pptx, P2.pptx,P1.pdf,P2.pdf, notes(1) notes(2) notes(3)
References
-
(*)
D. Roth, "On-Line Learning of Linear Functions (course notes)".
2000.
(
.pdf)
-
(*)
J. Kivinen and M. Warmuth, "The Perceptron Algorithm vs. Winnow:
Linear vs. Logarithmic Mistake Bounds when few Input Variables are
Relevant". 1995.
(link)
-
A. Blum, "On-Line Algorithms in Machine Learning". 1996.
(link)
-
(*)
A. Blum, "Learning Boolean Functions in an Infinite Attribute
Space". Machine Learning, 9(4):373-386, 1992.
(.ps)
-
R. Khardon, D. Roth, and R. Servedio, "Efficiency versus Convergence
of Boolean Kernels for On-Line Learning Algorithms". NIPS, 2001.
(link)
-
(*)
Y. Freund and R. Schapire, "Large Margin Classification Using the
Perceptron Algorithm". COLT, 1998.
(link)
-
N. Littlestone, "Learning Quickly When Irrelevant Attributes Abound".
Machine Learning 2(4):285-318, 1988.
(link)
-
Adam J. Grove, Nick Littlestone, Dale Schuurmans, "General Convergence
Results for Linear Discriminant Updates". Machine Learning 43(3):
173-210 (2001)
link
-
Shai Ben-David and Hans Ulrich Simon,
"Efficient Learning of Linear Perceptrons", NIPS 2000
(link)
-
Large Margin Winnow Methods for Text Categorization, Tong Zhang
(.ps)
-
Tong Zhang and Frank J. Oles. Text categorization based on regularized
linear classification methods. Information
Retrieval, 4:5-31, 2001.
- R. Khardon and G. Wachman,
Noise Tolerant Variants of the Perceptron
Algorithm, Journal of Machine Learning
Research , Vol 8, pp 227--248, 2007
- K. Crammer, O. Dekel, J. Keshet, S.
Shalev-Shwartz, and Y. Singer. Online
Passive-Aggressive Algorithms. (link)
- John Duchi, Elad Hazan, and Yoram Singer.
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. JMLR. 12 (July 2011), 2121-2159.
(pdf)
-
Lecture #5: Computational Learning
Theory, pdf
Reading: Mitchell, Chapter 7
References
-
Kearns and Vazirani,
Introduction to Computational Learning Theory
-
(*)
L. Valiant, "A Theory of the Learnable". CACM, pg 1134-1142, 1984 (link)
-
L. Pitt and L. Valiant, "Computational Limitations on Learning From
Examples". JACM, 35(4):965-984, 1988.
(.pdf)
-
A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth, "Learnability
and the Vapnik-Chervonenkis Dimension". JACM, 36(4):929-965, 1987.
(.pdf)
-
V. Vapnik and A. Chervonenkis, "On the Uniform Convergence of Relative
Frequencies of Events to Their Probabilities". Theoretical Probability
and Its Applications, 16(2):264-280, 1971.
(link)
-
(*)
David Haussler: Quantifying Inductive Bias: AI Learning Algorithms
and Valiant's Learning Framework. Artif. Intell. 36(2): 177-221 (1988)
(link)
-
David Haussler: Learning Conjunctive Concepts in Structural Domains.
Machine Learning 4: 7-40 (1989)
(link)
-
Lecture #6: Neural Networks, NN-P1.pptx, NN-P1.pdf, NN-P1-New.pptx, NN-P1-New.pdf, NN-P2.pptx, NN-P2.pdf, NN-P2-New.pptx, NN-P2-New.pdf
References
-
Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. "Learning representations by back-propagating errors." Cognitive modeling 5 (1988): 3.
(link)
-
Barron, Andrew R. Approximation and estimation bounds for artificial neural networks. Machine Learning, 14: 115-133, 1994.
(link)
-
Livni, Roi, Shai Shalev-Shwartz, and Ohad Shamir. "On the computational efficiency of training neural networks." In Advances in Neural Information Processing Systems, pp. 855-863. 2014. (link)
- Presentation: "On the computational complexity of deep learning", by Shai Shalev-Shwartz in 2015 (link)
-
Blum, Avrim L., and Ronald L. Rivest. "Training a 3-node neural network is NP-complete." In Machine learning: From theory to applications, pp. 9-28. Springer Berlin Heidelberg, 1993. (link)
-
Lecture #6: Boosting, pdf,
Formal View
References
-
Robert E. Schapire, "The strength of Weak Learnability".
Machine Learning 5(2):197-227, 1990
-
Yoav Freund and Robert E. Schapire, "A decision-theoretic
generalization of on-line learning and an application to
boosting". Journal of Computer and System Sciences,
55(1):119-139, 1997. (.ps)
-
Erin L. Allwein, Robert E. Schapire and Yoram Singer, "Reducing
multiclass to binary: A unifying approach for margin
classifiers". Journal of Machine Learning Research, 1:113-141,
2000. (.pdf)
-
Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee,
"Boosting the margin: a new explanation for the effectiveness of
voting methods". The Annals of Statistics, 26(5):1651-1686,
1998. (.ps)
-
Lecture #7: Multiclass Classification,
pdf
References
- Sariel Har-peled, Dan Roth and Dav Zimak,
" Constraint classification for multiclass classification and ranking".
NIPS2003. (.pdf)
- Midterm Review,
pdf
- Midterm Exam (during class)
-
Lecture #8: Support Vector Machines,
pdf
Additional Notes on Optimization and
SVMs
Additional Notes on Logistic Regression and
SVMs
References
-
C.-J. Lin, Optimization, Support Vector Machines, and Machine
Learning. Talk in DIS, University of Rome and IASI, CNR,
Italy. September 1-2, 2005.
(slides)
-
C. Burges, "A Tutorial on Support Vector Machines for Pattern
Recognition". Data Mining and Knowledge Discovery, 2(2):121-167,
1998.
(citeseer)
-
Lecture #9: Bayesian
Learning,
pdf
Additional Notes: naive Bayes (1) pdf ,
naive Bayes (2) pdf
Reading: Mitchell, Chapter 6
-
Lecture #10: The EM Algorithm,
pdf
-
Lecture #11: Learning Probability Distributions,
pdf
-
Lecture #12: Clustering,
pdf
- Final Review,
pdf
- Final Exam (May 9th, 2017)
Dan Roth