Spectral Methods for Modeling Language
We use spectral methods (SVD) to building statistical language
models. The resulting vector models of language are then used to
predict a variety of properties of words including their entity type
(E.g., person, place, organization ...), their part of speech, and
their "meaning" (or at least their word sense). Canonical Correlation
Analysis, CCA, a generalization of Principle Component Analysis (PCA),
gives context-oblivious vector representations of words. More
sophisticated spectral methods are used to estimate Hidden Markov
Models (HMMs) and generative parsing models.
These methods give state estimates for words and phrases based on
their contexts, and probabilites for word sequences. These
again can be used to imrpove performance on many NLP tasks.
Core to this work is the use of the Eigenword,
a real-valued vector associated with a word that captures its meaning
in the sense that distributionally similar words have similar
eigenwords. Eigenwords are computed as the singular vectors of the
matrix of co-occurrence of words and their contexts.
They can be context-oblivious (the vector does not depend
on the word's context, only on
the word) or context-sensitive (the vector depends on the context).
For more information
- Eigenword
collections and software
-
Using Regression for Spectral Estimation of HMMs
SLSP 2013
Jordan Rodu, Dean P. Foster, Weichen Wu, and Lyle H. Ungar
- Experiments with Spectral Learning of Latent-Variable PCFGs
NAACL 2013
Shay Cohen, Karl Stratos, Michael Collins, Dean P. Foster and Lyle Ungar
-
Multi-View Learning of Word Embeddings via CCA
NIPS 2011: Dhillon, Foster and Ungar
- Spectral dimensionality
reduction for HMMs
ArXiV 2012: Foster, Rodu and Ungar
-
Spectral Learning of Latent-Variable PCFGs
ACL 2012 Cohen, Stratos, Collins, Foster and Ungar
- Spectral Dependency
Parsing with Latent Variables
EMNLP-CoNLL 2012 Dhillon, Rodu, Collins, Foster and Ungar
- Two Step
CCA: A new spectral method for estimating vector models of words
ICML 2012 Paramveer Dhillon, Jordan Rodu, Dean Foster and Lyle Ungar
and
supplemental material
Our
2013 NAACL Tutorial on Spectral Learning Algorithms for Natural
Language Processing has
more references
Key Collaborators
home: ungar@cis.upenn.edu