Course Home Page for CIS 620: Advanced Topics in Artificial Intelligence

CIS 620: ADVANCED TOPICS IN ARTIFICIAL INTELLIGENCE
SPRING 2002

URL for this page: http://www.cis.upenn.edu/~mkearns/teaching/cis620/cis620.html

INSTRUCTORS
Prof. Michael Kearns
mkearns@cis.upenn.edu
Office hours: Wednesday 2-4 PM in 555 Moore/GRW

Prof. Lawrence Saul
lsaul@cis.upenn.edu
Office hours: Wednesday 2-4 PM in 557 Moore/GRW

COURSE DESCRIPTION
Over the last decade, significant advances have been made in a number of core problems of artificial intelligence (AI). These advances share an emphasis on probabilistic models of uncertainty, compact representations of complex environments, and efficient algorithms for reasoning, acting, and learning from experience. We will survey the main problems of probabilistic AI, including planning and learning in Markovian environments, reasoning under uncertainty via probabilistic graphical models, and interaction between rational agents in game-theoretic settings. We will emphasize the significant connections between these problems and the solutions proposed for them. The course will feature a mix of theory and application, with formal mathematical analyses of algorithms as well as frequent in-class computer demonstrations.

COURSE LOCATION AND TIME
Monday and Wednesday 10:30 - 12 in Towne 321.

PREREQUISITES
Introduction to AI at the level of CIS 520. Some multivariable calculus, linear algebra, and elementary probability theory would be helpful, but not essential. This course is aimed at graduate students and advanced undergraduates.

COURSE REQUIREMENTS
Problem sets and a course project; maybe occasional scribe notes and presentations of readings.

RECOMMENDED BOOKS

Reinforcement Learning , R. Sutton and A. Barto, MIT Press, 1998.

Probabilistic Reasoning in Intelligent Systems , J. Pearl, Morgan Kaufmann, 1988.

There will also be frequent supplementary research readings made available on this page.

COURSE NEWSGROUP
There is a newsgroup for the course, which is upenn.cis.cis620. Students are free to communicate with each other there, and we will occasionally post news and information about the course and problem sets there.

INFORMATION ON COURSE PROJECTS

IMPORTANT FINAL INFORMATION ON PROJECTS: You MUST turn your projects in by THIS FRIDAY MAY 3, 4 PM AT THE LATEST. Please make TWO COPIES of your project, and place one copy in each of our departmental mailboxes (M. Kearns and L. Saul) in Moore/GRW 559 (this is the CIS administrative office). Formatting information for projects: Most projects will consist of a 6-9 page write-up, including abstract, figures, and references. (If you provide source code, do not count this towards the 6-9 write-up pages.) Write-ups should aim for clarity, brevity, originality, and wit.

Course projects are due Fri May 3, the last day of finals week. This is a HARD deadline.

You should propose a project idea and have it approved by us by Wed Apr 3. Please send us your idea(s) in an email first, then come by to discuss it in person at office hours (or make an appointment with one of us).

Projects will constitute somewhere between 2 and 4 problem sets worth of the course grade. We'll be quite receptive to weighting the project more heavily if it's a strong one.

PROBLEM SETS, LECTURE NOTES, HANDOUTS
Note on MATLAB access: The PC labs are in the Towne building. You can get full details to the labs at http://www.seas.upenn.edu/cets/labs.html. The PC labs require an Eniac account for access. Anyone with a gradient account is also entitled to an Eniac account if she or he doesn't already have one.
Problem Set #1 (due Weds Jan 23, start of class) [Postscript] [PDF]
A clarifying note on definitions [Postscript] [PDF]
Solutions to Problem Set #1 [Postscript] [PDF]
Paper: "An Upper Bound on the Loss from Approximate Optimal-Value Functions", by Singh and Yee [Postscript] [PDF]
Problem Set #2 (due Weds Feb 6, start of class) [Postscript] [PDF]
Related paper: "A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes", by MK, Mansour, Ng [Postscript] [PDF]
Link to Prof. Saul's lecture note directory here
Lecture Notes on Probability [Gzipped Postscript] [PDF]
Lecture Notes on Regression [Gzipped Postscript] [PDF]
Problem Set #3 (Due Wed Feb 21, start of class) [Gzipped Postscript] [PDF]
Related MATLAB binary data files: nasdaq_lag5.mat digits2vs3.mat
Lecture Notes on Convexity [Gzipped Postscript] [PDF]
Lecture Notes on Iterative Scaling [Gzipped Postscript] [PDF]
Problem Set #4 (Due Wed Feb 27, start of class) [Gzipped Postscript] [PDF]
Lecture Notes on State Aggregation [Gzipped Postscript] [PDF]
Lecture Notes on Sensor Fusion [Gzipped Postscript] [PDF]
Lecture Notes on HMMs [Gzipped Postscript] [PDF] Problem Set #6 (Due Monday Mar 25) [Gzipped Postscript] [PDF]
Problem Set #7 (due Mon Apr 1) [Postscript] [PDF]
MK tutorial including inference in Bayes nets, reinforcement learning [Postscript] [PDF]

DETAILED COURSE OUTLINE
The outline below is subject to ongoing revision. In general, we'll try to modify the outline both in advance (to reflect anticipated changes in material and scheduling of topics) and in retrospect (to reflect what was actually covered in each past week). Just to be fancy, we'll change the color of subjects we've already covered to blue.

WEEK 1 (Jan 7):
INTRODUCTION; PLANNING IN MDPs

Course overview

Introduction and motivation for Markov Decision Processes (MDPs) and reinforcement learning

Definition of an MDP: states, actions, transition probabilities

Policies in an MDP

Notions of return: discounted, undiscounted, delayed rewards, credit assignment

Horizon time

Planning in MDPs: computing the optimal policy given all parameters of an MDP

Value functions, fixed-point recurrence of value function

Greedy policies with respect to value functions

Bellman optimality equation for the optimal value function

The value iteration algorithm for planning

Proof of value iteration convergence to optimal value function/policy

Demo of value iteration: grid world with stochastic "wind" and dragons

Policy evaluation via solution of a linear system of equations

The policy iteration algorithm for planning

Monotonic improvement of policy iteration, relation to value iteration: see Problem Set 1

Related Readings

Sutton and Barto pages 112-116 discusses Monte Carlo policy evaluation, which we called the TD(k=infinity) algorithm

S&B pages 133-145 discusses the interpolation from relying purely on your value function estimate to relying purely on Monte Carlo trajectories

S&B pages 164-168 examines what we have been calling TD(k)

Lecture Notes for Jan 14 (Foundations of Probability) [Gzipped Postscript] [PDF]
Paper: "On the Complexity of Solving Markov Decision Processes", by Littman, Dean, and Kaelbling [Postscript] [PDF]

WEEK 3 (Jan 21; Short week, MLK day):
LEARNING IN MDPs: TD(k) ANALYSIS

Analysis of phased TD(k) via large deviation bounds and union bound (uniform convergence analysis)

Phased TD(k) "bias-variance" bound on error

Schedule for k as a function of phase number

Very brief mention of TD(lambda)

Introduction to Q-learning

Paper: " 'Bias-Variance' Error Bounds for Temporal Difference Updates", by MK and Singh, gives a fairly brief treatment of the TD(k) material we covered (lots of details missing, though) [Postscript] [PDF]

WEEK 4 (Jan 28):
Q-LEARNING; EXPLORATION VS. EXPLOITATION; LARGE STATE SPACES

Analysis of Q-learning via large deviation bounds; dependence on number of states

Indirect approach

Exploration vs Exploitation: E^3 algorithm and analysis

Function approximation in large state spaces: use in Q-learning, convergence properties, gradient methods

Generative models for large state spaces

Sparse sampling algorithms for planning

Problem Set #2 (due Weds Feb 6, start of class) [Postscript] [PDF]
Paper: "A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes", by MK, Mansour, Ng [Postscript] [PDF]

WEEK 5 (Feb 4):
LINEAR AND LOGISTIC REGRESSION; GRADIENT AND NEWTON METHODS

Discrete models, continuous models

Regression as graphical model

Linear regression: least squares

Logistic regression: gradient descent, Newton's method

Lecture Notes on Regression [Gzipped Postscript] [PDF]
Problem Set #3 (Due Wed Feb 21, start of class) [Gzipped Postscript] [PDF]
Data sets in MATLAB (binary) format: nasdaq_lag5.mat digits2vs3.mat

WEEK 6 (Feb 11):
PREDICTION AND ITERATIVE SCALING

Convex functions

Jensen's inequality

Legendre transformations

Iterative scaling

WEEK 7 (Feb 18):
STATE AGGREGATION AND CLUSTERING

Aggregate Markov models

Language modeling

k-means algorithm

Gaussian mixture models

Graphical representation

EM algorithm: proof of convergence

Lecture Notes on Iterative Scaling [Gzipped Postscript] [PDF]
Problem Set #4 (Due Wed Feb 27, start of class) [Gzipped Postscript] [PDF]
Lecture Notes on State Aggregation [Gzipped Postscript] [PDF]

WEEK 8 (Feb 25):
DIMENSIONALITY REDUCTION AND FUSION

PCA algorithm

Factor analysis

Probabilistic union model

Application to vision, speech

Lecture Notes on Sensor Fusion [Gzipped Postscript] [PDF]

WEEK 9 (Mar 4):
HIERARCHY AND DYNAMICS

Decision trees

Hierarchical mixture of experts

Hidden Markov models

Baum-Welch algorithm

Application to speech, chat mining

Lecture Notes on HMMs [Gzipped Postscript] [PDF]

WEEK 10 (Mar 11):
SPRING BREAK

WEEK 11 (Mar 18):
GRAPHICAL MODELS FOR PROBABILISTIC INFERENCE

Motivation and definition of Bayes nets

Examples of Bayes nets from previous lectures

Conditional independence in Bayes nets; d-separation

Explaining away

WEEK 12 (Mar 25):
INFERENCE IN BAYES NETS

Definition of inference problem

The polytree algorithm and analysis

Variational methods for BN inference

Problem Set #7 (due Mon Apr 1) [Postscript] [PDF]
MK tutorial including inference in Bayes nets, reinforcement learning [Postscript] [PDF]

WEEK 13 (Apr 1):
BAYES NETS AND MDPs

Large deviation methods for BN inference

Factored MDPs

Exploration vs. exploitation in factored MDPs

Paper: An Introduction to Variational Methods for Graphical Models. M. Jordan, Z. Ghahramani, T. Jaakkola, L. Saul. [Postscript] [PDF]
Paper: Variational Methods and the QMR-DT Database. T. Jaakkola and M. Jordan. [Postscript] [PDF]
Paper: Large Deviation Methods for Approximate Probabilistic Inference, with Rates of Convergence, MK and LS. [Postscript] [PDF]

WEEK 14 (Apr 8):
INTRODUCTION TO GAME THEORY

Introduction to multi-agent interaction and game theory

Basics of game theory: matrix games, mixed strategies, Nash equilibria, etc.

Minimax theorem

Nash's theorem

WEEK 15 (Apr 15):
STOCHASTIC AND GRAPHICAL GAMES

Compact representations in game theory: multistage, multiplayer, multiaction

Definition of stochastic games

Planning in stochastic games: minmax value iteration, sparse sampling, correlated equilibria

Q-learning in stochastic games

Undirected graphical models for game theory

Tree algorithms

Summarization games