CIS 700/04
ADVANCED TOPICS IN
MACHINE LEARNING
FALL 2003
URL for this page: http://www.cis.upenn.edu/~mkearns/teaching/cis700
INSTRUCTOR
Prof. Michael Kearns
mkearns@cis.upenn.edu
Office hours: By appointment
COURSE LOCATION AND TIME
Tuesday and Thursday 12-1:30, in
307 Levine Hall. PLEASE NOTE
CHANGE IN ROOM.
COURSE DESCRIPTION
This seminar course will examine selected recent developments in
machine learning research from the literature. While more detail will
be provided as it develops, potential topics include things like:
I am also open to suggestions for both topics and specific readings.
COURSE FORMAT AND REQUIREMENTS
The course will be run in seminar/reading group format. Each week we will
examine a couple of papers, and each session will have one or more participants
who will lead an informal discussion of the chosen material (not give a formal
presentation). I will probably also use the course as a venue for external speakers.
Class participation will be determine approximately half of the course grade, with the other half being based on a course project.
Auditors are welcome, as are sporadic and one-time attendees who drop in for subjects that interest them.
PREREQUISITES
Familiarity with the basics of machine learning, statistics and
analysis of algorithms.
SCHEDULE AND READINGS
Th Sep 4: We'll start with a mainly organizational meeting to discuss possible topics for the term, and to start plotting schedule and volunteers.
Tu Sep 9, Th Sep 11: Support Vector Machines and Kernel Methods
I will start things off by leading discussions of some portions of the
papers below. There is likely to be lots of overlap between the first three items,
which are all tutorials, so at a minimum I'd like to suggest that everyone take
a good look at at least one of them.
Tu Sep 16, Th Sep 18: SVMs Continued
We'll finish up our study of SVMs with a couple of more application-oriented
papers. Fernando Pereira graciously offered to lead the discussion on the
second paper below.
Tu Sep 23, Th Sep 25: Clustering, and a Special Seminar
Tu Sep 23: Jon Kleinberg has a really thought-provoking
impossibility result that we will examine, led by
Nick Montfort.
For Th Sep 25, we'll have the following special seminar by Prof Yishay Mansour a well-known researcher in computational learning theory, computational game theory, and distributed computation. This talk mixes all of these topics, and can be viewed as a distributed learning result in a game-theoretic setting.
Title: Convergence Time to Nash Equilibria for Load Balancing
Abstract:
We study the number of steps required to reach a pure Nash Equilibrium
in a load balancing scenario where each job behaves selfishly and
attempts to migrate to a machine which will minimize its cost.
We consider variety of load balancing models, including identical,
restricted, related and unrelated machines. Our results have a crucial
dependence on the allowable weights assigned to jobs. We consider
arbitrary weights, integer weights, K distinct weights and identical
(unit) weights. We look both at an arbitrary schedule (where the only
restriction is that a job migrates to a machine which lowers its cost)
and specific efficient schedulers (such as allowing the largest weight
job to move first).
This is a joint work with Eyal Even-Dar and Alex Kesselman.
Speaker: Yishay Mansour
Affiliation: Tel-Aviv University
Tu Sep 30, Th Oct 2, Tu Oct 7: Clustering Continued
Sanjoy Dasgupta has made a number of fundamental insights and contributions to
clustering over the past few years. Sham Kakade has also suggested we look
at the related Vempala/Wang paper below.
Th Oct 9: Special Talk --- Prof Lawrence Saul
Prof Lawrence Saul
of CIS will give a special talk on very recent work
directly related to our clustering investigation. His abstract is below.
I also strongly encourage people to hit
Sebastian Seung's talk at IRCS, Fri Oct 10 at noon.
He is a terrific speaker.
TITLE: NONLINEAR DIMENSIONALITY REDUCTION BY SEMIDEFINITE PROGRAMMING
ABSTRACT:
Can we detect low dimensional structure in high dimensional
data sets? The problem of dimensionality reduction is
fundamental to machine learning, pattern recognition,
and statistics. In this talk, I will describe a new solution
to this problem based on semidefinite programming. The resulting
algorithm can be used to analyze high dimensional data
that lies on or near a low dimensional manifold. The algorithm
overcomes various limitations of previous work in manifold
learning (e.g., Isomap, LLE, graph laplacians). It also bridges two
recent developments in machine learning: semidefinite programming for
learning kernel matrices and spectral methods for nonlinear
dimensionality reduction.
Joint work with Kilian Weinberger (U Penn).
Th Oct 16, Tu Oct 21: Clustering Data Streams
Prof Sudipto Guha
Sudipto Guha will lead us through recent algorithms for on-line clustering of
data streams. More references to follow.
Th Oct 23, Tu Oct 28, Th Oct 30: Ranking
Libin Shen
will lead us through some of the many recent interesting papers on the problem
of learning ordinal rankings of alternatives.
Tu Nov 4, Th Nov 6: Dimensionality Reduction Revisited
Kilian Weinberger
will take us through a set of related papers on dimensionality
reduction that are closely related to his ongoing work with Lawrence Saul
on the topic.
Tu Nov 11, Th Nov 13: Boosting, Online Learning, and Game Theory
Kobby Essien will lead us through a set of related papers on online prediction and game-theoretic
learning frameworks.
Nov 20 and Dec 2: Approaches to Multiclass Learning
Yuan Ding
will lead our study of algorithms and models for
dealing with multiclass learning problems, many of which have a coding-theoretic element.
Dec 4: SPECIAL TALK
Title: Dimension reduction and information theory - what's the connection?
Speaker: Naftali Tishby, The Hebrew University
Abstract:
In this talk I will take a new look at Shannon's Information theory from a Machine Learning perspective. I will argue that Shannon's theory provides a compelling mechanism for quantifying the fundamental tradeoff between complexity and accuracy, by unifying the source and channel coding theorems into one principle which I call the "Information Bottleneck" (IB). This unified view of the coding theorems can shed new light on the question of "relevant data representation" and suggests new algorithms for extracting such representations from co-occurrence statistics. In the second part I will briefly examine two different applications of this principle to continuous dimension reduction and feature extraction from high dimensional datasets, Sufficient Dimensionality Reduction (SDR) and Gaussian-IB, both provide nonlinear continuous dimension reduction that preserve relevant information.
More information and other related papers can be found at http://www.cs.huji.ac.il/labs/learning/Papers/IBM_list.html
Possible Future Material