PENN CIS 620, FALL 2006: SEMINAR ON SPONSORED SEARCH
Prof Michael Kearns
mkearns@cis.upenn.edu
Mondays 12-3 PM
307 Levine Hall
COURSE DESCRIPTION
Keyword auctions for sponsored search, such as those offered by Google and Yahoo!, have become profitable and influential. They are also a fascinating new kind of market --- namely, a market in natural language phrases. While the structure of these new markets is evolving rapidly, there is also a diverse and nascent science emerging around the challenges and issues they raise. In this seminar course we will attempt to survey the many academic and commercial views of sponsored search that are under rapid development. The list of candidate topics includes an examination of the details of the various commercial auction mechanisms, their pricing schemes, and the analytic tools and data avaiable to advertisers to track and optimize their keyword portfolios; game-theoretic analyses of the equilibrium and other properties of keyword auctions; and interactions between the economic/financial and natural language aspects of sponsored search. See Candidate Topics below for more details.
Since statistical language processing, theoretical computer science, game theory and economics, and machine learning are all topics of central importance to sponsored search, it is my hope that we will have strong representation from each of Penn's excellent groups in these areas. The interests of course participants will play a major role in determining the precise content.
COURSE FORMAT AND REQUIREMENTS
The seminar will be run as an informal and conversational reading/research group. We will meet once a week for an extended period (Mondays 12-3) to encourage discussion and brainstorming. Requirements of those taking the course for credit consist of active participation and presentation in the meetings, and a course project. Students will be expected to work with me to select some number of papers or topics to lead the group through during one or more meetings. More details will emerge gradually in the Detailed Meeting Schedule.
For the course projects, I'd like most of these to be "online" in the sense that they are conducted during the term, not after or at the end, and thus can be shared with the group gradually as they develop. I am hoping that some number of these projects might include empirical experiments with data from one of the keyword auctions. For potential ideas, see Project Ideas below.
We also have a number of very interesting guest speakers/visitors to the seminar; see the Detailed Meeting Schedule for details as they emerge.
This is a new field, and there are no absolute prerequisites to the course. However, I expect the ability to follow a mathematical argument to be handy at various points (e.g. we will examine some formal game-theoretic analyses of keyword auctions), as will some basic knowledge of algorithm design and analysis, statistics, natural language processsing, game theory and economics. General web savvy will also be helpful. Overall the spirit will be to make things self-contained so that everyone can follow. Again, check out Candidate Topics below for just a sample of the type of material I am hoping to cover.
Auditors and occasional participants are welcome. Strong undergraduates are also encouraged to join.
DETAILED MEETING SCHEDULE
Mon Sep 11
Organizational and introductory meeting. We'll introduce ourselves, I'll overview
much of the material that is collected on this web site,
and we'll brainstorm on topics and
material we'd like to cover in the course.
Everyone should come to the first meeting with basic familiarity with
sponsored search auctions, which can be easily obtained by sampling some of the
many readings and materials on this site. You can also look at
"Sponsored Search: A Brief History"
by Fain and Pedersen. Informal outline of what we'll cover:
Mon Sep 18
Material for discussion:
Some course mechanics announcements:
If you're taking the seminar for credit, please look at the announcement
regarding projects under the Sep 25 entry (next week).
Also, you'll notice that due to new visitors/speakers, the sessions are already
filled until the Oct 30 gathering. Based on the remaining slots and the
current number of registered students, this probably means that everyone
will have to run a session at most once, and in fact you can/should probably work
in teams. The basic protocol will be that I will determine the approximate
content I'd like covered in advance, and then will request volunteers for
the
date in question.
Mon Sep 25
This week
Ryan Gabbard
will lead us through an investigation of the various services and tools provided to
advertisers by Google and Yahoo!, as well as those provided by third parties.
As preparation, please go and do some experimentation with the following tools:
IMPORTANT ANNOUNCEMENT FOR THOSE TAKING THE SEMINAR FOR CREDIT:
In order to get people thinking about their course projects early, all seminar credit
participants should send me (via email) an initial proposal(s) for your project by
Monday, Sep 25. This is just
the first round, so you can have more than one distinct proposal, and they can be brief.
Mon Oct 2
This week we have guest speaker
Chris Dixon,
founder of
McAfee's
SiteAdvisor,
an interesting web trust and anti-fraud service. He will talk
about various kinds of web fraud, including those directly
related to sponsored search, such as click fraud.
Readings (please check for updates):
Of general interest to the seminar: at the ACM Electronic Commerce conference this
past summer, there was an interesting panel discussion on models for sponsored
search; a transcript has recently been made available.
Mon Oct 9
Guest speaker/visitor
Alon Altman.
An Axiomatic Approach to Ranking Systems
Abstract: This talk will survey our recent work in applying the axiomatic approach to
ranking systems. Ranking systems are systems in which agents rank each other
to produce a social ranking. In the axiomatic approach we study ranking
systems under the light of basic properties, or axioms. In this talk I will
present our axiomatization theorem for the PageRank ranking system, prove an
impossibility and possibility result for general ranking systems, and
discuss the issue of incentives in ranking systems. Finally, I will show
initial results regarding personalized ranking systems, where a specialized
ranking is generated for each agent.
IMPORTANT ANNOUNCEMENT FOR THOSE TAKING THE SEMINAR FOR CREDIT:
In order to get the creative juices flowing for empirical projects, I am giving
a little assignment required of all students in the seminar. You should have
all received Kuzman's Script in an earlier email (let me know if you need a resend).
You should all use this script to obtain data for some moderately large
(say, at least in the dozens) set of phrases of your own choosing. Presumably these
phrases will be "related" in some interesting way (e.g. the names of all 50 states).
Please try to present an analysis of the data, what explains similarities,
differences, rankings, etc. I will discuss this
example in Excel
in class today to help clarify.
I'd like everyone to send me their brief analysis by this coming Sunday night,
Oct 15, for discussion in next Monday's session.
Mon Oct 16
This time we'll hear from
Kartik Hosanagar,
who is both a Wharton faculty member and closely involved with a
very interesting startup called
NatPal,
which he will tell us about. Because they do large-scale keyword auction
bidding on behalf of many clients, NatPal addresses many
important components of sponsored search, including various keyword and
portfolio optimization problems, bid aggregation, and supply-demand matching.
NatPal may also provide us some data for course projects, so we may hear about this
as well.
Mon Oct 23
Fall break; no meeting
Mon Oct 30
This week we get a brief break from our flurry of guest speakers; so we'll cover
the following paper:
Please read the paper in advance of our meeting. Our discussion leaders
will be Qian Liu and Michael Zavlanos. The paper will probably not consume our entire
meeting, so we will then go on to take a look at the
empirical studies
many of you
recently did using Kuzman's Script; please be prepared to say a little about yours.
Mon Nov 6
Today we will have guest speaker/visitor
David Pennock
of
Yahoo! Research.
Here is a description from David about what he will cover; please read the
papers he cites in advance:
I will give a brief and biased history of the sponsored search industry.
I will survey some of the sponsored search research being conducted in
the microeconomics group at Yahoo! Research. I will summarize some of
the activities at the annual Workshops on Sponsored Search Auctions.
Among other topics, I will discuss these papers:
Mon Nov 13
This week we will be led by Kuzman Ganchev, Alex Kulesza, and Jenn Wortman through the
following papers, but concentrating on the sections cited; please read accordingly.
The common theme is bidding strategies under extended models (budgets, multiple keywords, dynamic behavior).
Mon Nov 20
Today's meeting will be devoted to the discussion and development of course projects.
Everyone is encouraged to come and contribute, but
attendance and participation are required of registered students.
More specifically, all registered students should be prepared to make a brief presentation in
this session on their proposed course project. If you feel you already have an independent
project idea that you have discussed with me and obtained approval for, you should present
that idea to the group. You should clearly frame the question(s) you are investigating, motivate
them, describe your planned methods (theoretical or experimental), etc. Each presentation
will be followed by brief discussion and suggestions from the group.
Alternately, you will all recall that in our Oct 23 meeting (see above), we examined some of the
many interesting empirical experiments you did using Kuzman's Script, and I suggested that
another possibility was a group of us joining forces for a larger project and possible paper
along similar lines. If you are interested in joining such an effort, you should come prepared
with some suggested empirical experiments for group discussion.
To put some structure/discipline around all this, regardless of which category you fall
into above, I'd like everyone to send me a couple of slides we can put up for discussion
during the meeting;
please send these to me by sometime on
Sunday, Nov 19.
Finally, once we have settled the project ideas in this meeting, I expect at least some
nontrivial work to begin on them before the end of the term, so our final meeting on
Dec 4 will be devoted to progress reports on the projects.
ADDENDUM posted Nov 15:
Due to a death in my family last night,
I will be flying to CA this
weekend and not returning until later Monday. So though I will
not be present, I would still like everyone to carry on with
the plan outlined above
--- we need to get working on the projects.
Fernando has generously offered to informally oversee
the proceedings on Monday. I suggest having people who are
planning on doing solo projects present first, followed by
the brainstorming and presentation of ideas for the larger
group project. In all cases the point is to generate ideas
and get feedback, and for the group project, I'd like you
all to generate a reasonably specific plan of action and
division of labor.
In addition to sending your materials to me by Sunday night
so I can see what everyone is planning, please send them
to Jenn (wortmanj@seas.upenn.edu) so she can put them all
on a common machine for projection on Monday.
Mon Nov 27
Today we have the fascinating
Ben Edelman
as our visitor and speaker for a real extravaganza. First, Ben will present a more
formal talk
11 AM Mon Nov 27 in 307 Levine.
This talk is open to the general community, but should also be of specific interest
to Sponsored Search Seminar participants.
Here is the abstract for that talk:
Adverse Selection in Online "Trust" Certifications
Ben Edelman
Widely-used online "trust" authorities issue certifications without
substantial verification of the actual trustworthiness of recipients.
Their lax approach gives rise to adverse selection: The sites that seek
and obtain trust certifications are actually significantly less trustworthy
than those that forego certification. I demonstrate this adverse selection
empirically via a new dataset on web site characteristics and safety. I find
that TRUSTe-certified sites are more than twice as likely to be
untrustworthy as uncertified sites, a difference which remains statistically
and economically significant when restricted to "complex" commercial sites.
I also present analogous results of adverse selection in search engine
advertising - finding ads at leading search engines to be more than twice
as likely to be untrustworthy as corresponding organic search results for
the same search terms.
(Here is a
related paper. )
Following the talk above, Ben will continue and present in our usual
informal seminar setting. Here is a description of what he plans to cover:
Guest speaker Ben Edelman will present his work on equilibria in sponsored search:
Estimates of search engine revenues
from unstable allocations,
theoretical derivation of one equilibrium, and
new simulation results to test
convergence as well as to compare bidder outcomes and to compute policy
counterfactuals. What happens to search engine revenues and advertiser
surpluses if a search engine changes its minimum bid rule? Reduces the
number of ads shown in results? Changes the relative prominence of ads
in different positions? And if you had to choose a strategy for a PPC
advertiser, what strategy would you recommend? Ben's simulations attempt
to offer some answers. Ben will also show us a bit of the darker side of sponsored search.
Ben has caught spyware vendors showing Yahoo PPC ads in
troubling and allegedly prohibited ways -- like
covering merchants' sites with their own ads.
He also has video
proof of spyware performing click-fraud.
We'll look at some of these examples, and
we'll untangle the relationships that have created this mess. Ben may
also tell us a bit about Draucker
Development et al. v. Yahoo (in which he serves as co-counsel), advertiser class action
litigation claiming that Yahoo breached its contract with advertisers when it
put ads into spyware, typosquatting sites, untargeted banner ads, and other
unsavory placements.
Mon Dec 4
It has been a great ride, and this is our final meeting.
As promised, we will devote it to progress reports on course
projects; please send me your materials for projection by
Sunday night. For those not doing projects, I am also happy if anyone
wants to share any final thoughts on sponsored search, research ideas,
and so on. Again, just send me anything you want projected and I will
consolidate.
Here is the
directory of project materials.
FINAL POST
Course project write-ups are due to me via email Wed Dec 20. I look forward to
reading them. Thanks to all of you for participating in the seminar --- I hope you
all learned as much as I did. Happy Holidays!
CANDIDATE TOPICS AND MATERIAL
What follows below is my attempt to organize thoughts and material on
some sponsored search topics I find interesting. It is by no
means a definitive or exhaustive list of what we will examine. In some places
I have not yet found much or any literature on the imagined topic, but think there
might (or should) be some. My anticipation is that this list will grow and evolve as
we discover more topics and material, and that much of the material will eventually
be incorporated into the
Detailed Meeting Schedule
above.
The Advertiser's View.
Given that our investigation will include not just the science but also the practice of
sponsored search, it seems natural to begin with a detailed look at what sponsored search
really is, and how it works. In particular, what is it like to be an advertiser using
sponsored search? We'll look at the portions of advertiser documentation provided
by the various keyword auctions that describe how the auctions are run, how prices and
charges are set, and related topics. We'll also examine the extensive tools available to advertisers from
both the auctioneers and third parties for keyword performance tracking, keyword selection,
bid placement and management, and other functionality. A central focus here will be
what data is available to advertisers across the different auctions.
Some sample material:
Game-Theoretic Views of Keyword Auctions.
The field of algorithmic mechanism design has exploded within the last several years, and there is
now a body of literature on the algorithmic and game-theoretic properties of sponsored search in
particular (sometimes referred to as "slot auctions"). We'll sample selectively in this growing
area, including formal equilibrium analyses of existing keyword auction mechanisms, examinations
of alternative mechanisms, and empirical analyses.
Some sample material:
Sponsored Search and Natural Language.
While there is of course a very large literature on natural language search and information retrieval,
and a smaller but growing one on the economic aspsects of keyword auctions, my limited survey so far
suggests to me that there is a dearth of works truly in the intersection of these two topics. How
can we best blend the tools of economics and natural language processing/IR to understand
sponsored search? As a simple and concrete instance of the type of question one might ask, how would we
define and detect when two terms synonymous with each other in the semantic sense are "mispriced" by
the market relative to each other? (See Project Ideas below.)
Some sample material:
The Sponsored Search Jungle.
The topics above mainly focus on keyword auctions and their advertisers. But the success of the
sponsored search markets and the web more generally has spawned an elaborate ecosystem in which
all manner of methods are employed by advertisers, search engines and others to advance their own
interests. This jungle gives rise to phenomena like link farms, web spam, click fraud and many
others --- some of it intentional, some of it not, and much of it "undesirable" in one or another
sense. We should spend some time examining the broader environment and impacts of sponsored search.
Some sample material:
SEM vs. SEO.
Search engine optimization (SEO) refers to the process of tailoring a web site to optimize
its (unpaid, or "left side", or "organic") ranking for a given set of keywords or phrases.
Search engine marketing (SEM) refers to paid (or "right side") or sponsored search. While
our emphasis will be on SEM, it is inevitable that we consider some SEO issues and methods
as well, in particular because advertisers engage in both kinds of activity.
I haven't
gotten around to assembling SEO material yet, but here seems to be
one extensive site and
I am sure we can find more together.
Sponsored Search and Traditional Finance.
It is interesting to think about whether ideas and methods from traditional finance can
be imported to sponsored search in a useful way. For instance, how
could one apply classical mean-variance portfolio optimization ideas to keyword
auctions? How could one support a secondary market for keywords, and who would
benefit? What if the keyword auctions offered options, so that for a price a
florist could lock in now their position for the term "flowers" for the week preceding
Mother's Day 2007?
Some sample material:
None found/posted yet.
Other Topics and Material.
A number of readings suggested by David Pennock (need to assemble links and
integrate with above):
PROJECT IDEAS
Mispricing in Keyword Auctions.
In traditional finance, there are many settings in which it is possible to define
and detect "mispricings" between two or more investment instruments that are directly
or indirectly related to each other; perhaps the canonical example is that of the
current (spot) price for the share of a stock and the price of an option to buy or
sell that stock at a fixed price and future time (the subject of the classical
Black-Sholes theory). Mispricings, or more accurately the absence of them, are one
measure of the "efficiency" of a given market. What are the analogues of this line
of thought in sponsored search? One direction is to view different keywords as being
related to each other via their semantic (definitional) or usage aspects. A
semi-specific project would be to compile sets of semantically or operationally
synonymous words/phrases, and empircally investigate how prices vary across
synonymous terms. A next step would be to measure how much of ths variance can
be explained by overall search popularity of the terms.
Discovering Supply-Demand Imbalances.
(Suggested by
Chris Dixon )
One can view the "demand" for a given search term as being roughly represented by
the volume and quality of people searching on it (where quality might be measured by
how much searchers spend), and the "supply" as being roughly represented by the
volume and quality of advertisers purchasing the search term (where quality might be measured
by whether the advertisers provide relevant/good products). An interesting project would
quantify these vague notions and empirically investigate the nature and extent of
search terms for which there are supply-demand imbalances.
Risk Mitigation Services for Advertisers.
(Based on conversations with David Pennock and John Langford of Yahoo!)
Suppose you are a Philadelphia florist; you might be quite willing to
pay some amount now for the guarantee that during the week leading up to
Mother's Day 2007, your bid of 50 cents per click on the phrase
"Philadelphia flowers" will be one of the top four sponsored search
results. We can think of this guarantee as an option --- the option
to purchase a spot in the top four sponsored results at a future
time for a fixed price. If (say) the top bid in the week before Mother's
day is only 25 cents, you obviously won't exercise this option; if the
fourth highest bid is a dollar, you will. (For simplicity here we are
ignoring "quality" components to the ranking of sponsored ads.)
What can we say about how
to price such options? Can we import or apply the classical theory
of options pricing in finance such as Black-Scholes, and how?
Note that although such options are not offered by any of the major
sponsored search auctions (yet), it is interesting to think about
the issues involved in setting up an independent service offering them.
Such a service would enter option agreements with advertisers and
then be responsible for meeting the resulting guarantees on behalf
of those advertisers, bidding whatever was necessary to meet them,
possibly at a loss. From the perspective of this service, the goal
is to profit from the combination of correct pricing of the options
on average and the aggregation of risk across many advertisers.
Another interesting idea is the use of independent information markets
as a hedge against future price movement. Suppose the same Philadelphia
florist knows that it can make profits only up to 50 cents per click,
and is worried that in the future the competitive prices for the
sponsored results on "Philadelphia flowers" will greatly exceed this
limit. If there were a market in which participants could simply
bet on the rise or fall of sponsored search prices, the florist could
invest in the event that "Philadelphia flowers" will appreciate in
value (that is, take a long position of shares in that phrase in the
information market). If the florist gets priced out of the advertising
market in the future, they will at least profit in the information
market. See the
Tech Buzz Game
for an interesting example of a similar information market.
"Agent-Based" Analysis of Sponsored Search.
During Ryan Gabbard's
presentation of third-party advertiser services
, there
was a brief discussion of the various heuristics available to users of
Atlas' BidManager.
One can easily imagine a future (or present?) in which the vast majority of
bidding is done not by "fully rational" agents, but by large populations of
such simple agents. One could then imagine doing modified equilibria analysis
in which the strategies available to advertisers are limited to such
heuristics. Such analysis could be either theoretical or based on simulations,
the latter possibly informed by empirical auction data.
Department of Economics
Harvard University