CIS 419/519 Introduction to Machine Learning

Fall 2016, University of Pennsylvania

Instructor:  Eric Eaton, Ph.D.



Syllabus and Schedule

This is a tentative syllabus and schedule.  Topics, reading assignments, due dates, and exam dates are subject to change.
All assignments and projects are due by 11:59:59pm Eastern time on the day listed.

The readings will come from Machine Learning (Flach), Learning from Data (LfD), the reading packet (Handout), or online sources.

Recordings of the lectures are available online.

Wk Date Topic Reading Assignments Comments                 
1





W 8/31
Introduction, Framing a Learning Problem
Flach: Prologue & Ch. 1
Sign up for Piazza Optional reading (but strongly suggested)
The Discipline of Machine Learning
2
M 9/5
Labor Day (no class)


W 9/7
Supervised Learning & Decision Trees Handout: Decision Trees 
Assignment 1 out
[Assignment 1 Skeleton]
[Assignment 1 Submission Instructions]
Slides from python tutorial
3
M 9/12
Decision Trees & Overfitting,
k-Nearest Neighbor



W 9/14
Evaluation Flach 2.1-3.2

4
M 9/19
Linear Regression and Gradient Descent LfD 1.1, 3.1-3.2.1;

Optional linear algebra review:  Barber 29.1.1-29.1.9

Alternative reading if you don't
have LfD: Flach 7.1-7.2
W 9/21
Regularization (see above slides)
Linear Classification using the Perceptron
LfD 3.3
Alternative reading if you don't
have LfD: Flach 7.4
5
M 9/26
Logistic Regression LfD 4.1-4.2 Assignment 1 due
W 9/28
Why Machine Learning Works:
VC Dimension & Generalization Bounds
 
Learning Theory notes, LfD 3.2.2
For extra help on learning theory,
read LfD Ch. 2
6
M 10/3
Support Vector Machines & Kernels
Flach 7.3; LfD 3.4
Alternative reading if you don't
have LfD: Flach 7.5
W 10/5
Flach 7.5;
Bennett article


7
M 10/10
Ensemble Methods Flach 11.2, LfD 4.3 Project Proposal due

W 10/12
Review
Assignment 2 due (Due date extended to Oct 14th)
[Assignment 2 Skeleton]
[Assignment 2 Submission Instructions]

8 M 10/17
Midterm Exam


Old exams: Fall 2014 Midterm Exam, Fall 2015 Midterm Exam
W 10/19 Probability Review  Generative Model notes (Section 2 only)

9
M 10/24
Naive Bayes skim Flach 9.1;
Generative Model notes

For extra help with naive Bayes,
read Flach 9.2
W 10/26
Text Classification & Evaluation


10
M 10/31
Neural Networks


W 11/2
Assignment 3 due
[Assignment 3 skeleton]
[Assignment 3 Submission Instructions]

11
M 11/7
Deep Learning Handout: Deep Learning

W 11/9

For more detail on deep learning, see: Bengio article (optional reading)
12 M 11/14
Unsupervised Learning: 
K-Means & GMMs

Flach 8.4-8.6, 10.3

W 11/16
Reinforcement Learning
Sutton & Barto Ch. 3, Ch. 4 Assignment 4 due
[Assignment 4 skeleton]
[Assignment 4 Submission Instructions]

13 M 11/21
Reinforcement Learning (continued) Sutton & Barto Ch. 6;
RL notes
Project Status
Report
due Tuesday 11/22
[LaTeX template]

W 11/23
No Class (Friday class schedule)



14 M 11/28
Principal Components Analysis,
Image Features



W 11/30
Learning on Networks,
Machine Learning for Big Data

Assignment 5 due
[Assignment 5 skeleton]

15
M 12/5
Special Topics: TBA
(Dr. Eaton out of town)



W 12/7



16
M 12/12
Review
(Dr. Eaton out of town)

Final Project Report and Summary Slides Due
[LaTeX template]


Thurs. 12/15 @ 12pm in MEYH B1
Final Exam (in MEYH B1)


Old exams: Fall 2014 Final Exam, Fall 2015 Final Exam



Contact

Photo of Eric Eaton

INSTRUCTOR

Eric Eaton, Ph.D.

E-mail: -- Make certain you put "[CIS 419]" or "[CIS 519]" at the start of the subject line to ensure a quicker response.  However, please only email me about personal matters.  Any questions about course materials, logistics, etc. should be posted to Piazza in a private post.

Office Hours:  MW 2:00-2:50PM, and by appointment
Office:  Levine 264

For any course-related matters (including questions about material, grades, logistics, etc.), please send me your question via Piazza in a private post.  This makes it MUCH easier to ensure that I respond to all course-related questions quickly.  However, if you have something of a more personal nature, please feel free to email me.  I make a concerted effort to respond to all e-mails within 24 hours on weekdays and 48 hours on weekends (often, much less!).  You're also welcome to stop by my office hours anytime to talk about the course or anything else that interests you.


Teaching Assistant

Kathy Chen (Senior undergraduate, CIS & BCHE)

E-mail:

Office Hours:  Monday 2:15PM-4:15PM, Levine 5th floor bump space by elevators

Research Interests: developing machine learning based approaches to capture useful biological patterns (bioinformatics, systems biology)


Teaching Assistant

Xiang Deng (Master's Student, Robotics)

E-mail:

Office Hours:  Thursday: 5-7pm, DRLB 2N36

Research Interests: Vision, Robot Planning and Control, Probabilistic Inference and Modeling


Teaching Assistant

Tianli Han (Master's student, CIS)

E-mail:

Office Hours:  Friday: 1-3pm, Benn 140

Research Interests: Natural Language Processing, Computer Vision

.


Teaching Assistant

Kris Jordan (2nd year PhD student, CIS)

E-mail:

Office Hours:  Fridays 2-4pm, common area on the 4th floor of Levine (near L474)

Research Interests: computer vision and robotics


Teaching Assistant

Thomas Lee (Senior undergraduate, CIS / Master’s student, ESE)

E-mail:

Office Hours:  Mondays 11am-12 pm and 1:30-2:30pm, Levine 5th floor bump space by elevators

Research Interests: Machine learning applications to financial markets, energy systems (integration of intermittent renewables)


Teaching Assistant

Kelsey Saulnier (3rd year PhD student, ESE)

E-mail:

Office Hours:  Thursday 1-3pm,  Towne 211

Research Interests: Information-based multi-robot exploration and swarm security


Teaching Assistant

Zheng Wu (2nd year Master student, CIS)

E-mail:

Office Hours:  Tuesday 4:30 - 6:30 pm, Towne 215

Research Interests: Machine learning, data mining, and large scale cloud computing



Course Information

Course Description

Machine learning has been essential to the success of many recent technologies, including autonomous vehicles, search engines, genomics, automated medical diagnosis, image recognition, and social network analysis, among many others. This course will introduce the fundamental concepts and algorithms that enable computers to learn from experience, with an emphasis on their practical application to real problems.  This course will introduce supervised learning (decision trees, logistic regression, support vector machines, Bayesian methods, neural networks and deep learning), unsupervised learning (clustering, dimensionality reduction), and reinforcement
learning.  Additionally, the course will discuss evaluation methodology and recent applications of machine learning, including large scale learning for big data and network analysis.

Prerequisites: CIS121

Course Website: http://www.seas.upenn.edu/~cis519/

Time:  Monday/Wednesday, noon to 1:30 pm
Location:  Wu and Chen Auditorium (Levine 101)


Comparison to CIS 520 (Machine Learning)

Due to overwhelming demand, Penn is offering two different machine learning courses this semester:  CIS 419/519 (Introduction to Machine Learning) and CIS 520 (Machine Learning).  This section briefly describes the differences between these courses.

CIS 419/519 Introduction to Machine Learning (this course!) is an introductory-level course in machine learning (ML) with an emphasis on applying ML techniques. The course is cross-listed between undergraduate (419) and graduate (519) versions; the graduate course 519 has somewhat different requirements as described below.  CIS 419/519 is intended for students who are interested in the practical application of existing machine learning methods to real problems, rather than in the statistical foundations and theory of ML covered in CIS 520.  Just because it is listed as "introductory" does not necessarily mean that it is "easier".

CIS 520 Machine Learning is a more mathematically rigorous course in statistical machine learning that provides the background necessary to design and use new ML algorithms.  Consequently, CIS 520 requires students to have basic knowledge of linear algebra (matrices, eigenvectors, etc.) It uses Matlab and is said to require a lot of work, but prepares students to conduct ML research.
 
CIS 519 is NOT a prerequisite for CIS 520.  However, it makes little sense to take CIS 519 after having already taken CIS 520.  You certainly may take CIS 419/519 first and then later take CIS 520.

Essentially, you should take CIS 419/519 if:

And, you should take CIS 520 if you're confident in your mathematical background and:


Additional Requirements for CIS 519

Students registered for the graduate version of this course (CIS 519) will be required to complete additional work throughout the semester.  This work will include additional components to the homework, additional requirements on the course project, and (possibly) different or additional questions on the exams.

Since the two versions have different requirements, you cannot complete the course as CIS 419 and later petition to have it changed to CIS 519 for graduate credit; if you're considering changing this course to CIS 519 for graduate credit, you should register for the graduate version now.



Text & Software

Textbooks

Picture of course reading
              packet

CIS 419/519 Course Reading Packet

This is a collection of readings that will be used throughout the course. There is NOT a single reading packet you need to obtain -- readings will be distributed incrementally throughout the semester, either in hard-copy or posted online throughout the course.


Learning from
              Data cover

Learning From Data by Y. S. Abu-Mostafa, M. Magdon-Ismail, and H.T. Lin.
AML Book

cover of Flach textbook

Machine Learning: The Art and Science of Algorithms That Make Sense of Data by Peter Flach
Cambridge University Press


Throughout the course, you may find it useful to consult the following resources:

For a more advanced treatment of machine learning topics, I would recommend one of the following books:

Software



Course Policies

Communication

Attendance and active participation are expected in every class. Participation includes asking questions, contributing answers, proposing ideas, and providing constructive comments.

As you will discover, I am a proponent of two-way communication and I welcome feedback during the semester about the course. I am available to answer student questions, listen to concerns, and talk about any course-related topic (or otherwise!). Come to office hours! This helps me get to know you. You are welcome to stop by and chat. There are many more exciting topics to talk about that we won't have time to cover in-class.

Whenever you e-mail me, be sure to use a meaningful subject line and include the phrase "[CIS 419]" or "[CIS 519] at the beginning of the subject line. Your e-mail will catch my attention and I will respond quicker if you do this. I make an effort to respond to e-mails within 24 hours on weekdays and 48 hours on weekends.  However, unless it is a private matter, you should be posting your questions/issues to Piazza.

Although computer science work can be intense and solitary, please stay in touch with me and the TAs, particularly if you feel stuck on a topic or project and can't figure out how to proceed. Often a quick e-mail, face-to-face conference, or Piazza post can reveal solutions to problems and generate renewed creative and scholarly energy. It is essential that you begin assignments and projects early, since we will be covering a variety of challenging topics in this course.


Piazza logoWe will be using Piazza as the course message board.  We also make course-wide announcements through Piazza, be sure to sign up for it.  You are responsible for the content of all announcements on Piazza.


Grading

Your grade will be based upon five homework assignments, two exams, and a course project.  Assignments must be submitted according to the assignment submission instructions.

At the end of the semester, final grades will be calculated as a weighted average of all grades according to the following weights:

Assignments:
40% (8% each)
Midterm Exam: 15%
Final Exam:
20%
Project:
25%
Total: 100%

The project grade will be broken down further in the Project Description.

Incomplete grades will be given only for verifiable medical illness or other such dire circumstances.

All graded work will receive a percentage grade between 0% and 100%.  Here is how the percentage grades will map to final letter grades; percentages are not rounded:

Percentage
Letter grade

Percentage Letter grade
97% <=
A+ (4.0)
77% <= C+ (2.3)
93% <= A (4.0) 73% <= C (2.0)
90% <= A- (3.7) 70% <= C- (1.7)
87% <= B+ (3.3) 67% <= D+ (1.3)
83% <= B (3.0) 60% <= D (1.0)
80% <= B- (2.7) < 60%
F (0.0)

The instructor reserves the right to adjust the percentage ranges for each letter grade upward in your favor.


Academic Integrity

All work in this course is subject to the University's Academic Integrity policy.  Violations of the academic integrity policy or the course collaboration policy will incur consequences according to university regulations.  Penalties for academic dishonesty may lower the final grade in the course.  If one student shares code with another on a different team, both the donor and the recipient of the code are in violation of the academic integrity policy and will be referred to the Office of Student Conduct.

If required by any assignment, you must list all people you worked with or consulted, and all resources you consulted (excluding the course textbooks and notes) during the completion of the assignment.


Submission and Late Policy

All work must be turned in either in hard-copy or electronic submission, depending on the instructions given in the assignment.  E-mailed submissions will not be accepted.  Extensions will be given only in the case of verifiable medical excuses or other such dire circumstances, if requested in advance.

Late submissions will receive a penalty of 15% for every 0-24 hours it is past the due date and time (e.g., assignments turned in 25 hrs late will receive a penalty of 30%).  Submissions received more than one week late will not be accepted. 

Everyone will receive two free late days.  It is up to you to track how many late days you've used and mark your count at the top of each late hardcopy submission.  (For example, if you turn in one assignment 3 hours late, you would write at the top "Turning in 3 hours late, using 1 out of 2 late days.")  Late days cannot be used on any component of the project or any in-class presentation/event that affects other students.


Exams

There will be two exams in this course.  The exams will be closed-book and closed-notes.  They will cover material from lectures, homeworks, and assigned readings (including topics not discussed in class).  So, keep up with those readings!


Collaboration Policy

I want to encourage you to discuss the material and work together to understand it. Here are my thoughts on collaborating with other students:

If you have any questions as to what types of collaborations are allowed and which are dishonest, please ask me before you make a mistake.


Electronic Devices

I have no problem with you using computers or tablets to take notes or consult reference materials during class.  Tempting though it may be, please do not check e-mail or visit websites that are not relevant to the course during class.  It is a distraction, both for you and (more importantly) for your fellow classmates.  Please silence your phones and computers when you enter class.


Reference Links

Useful references will be posted here throughout the semester.

Article by David Mimno on Data Pre-Processing


LaTeX Resources: