Eigenwords and Eigencontexts
An Eigenword is an
real-valued vector "embedding" associated with a word that captures its meaning
in the sense that distributionally similar words have similar
eigenwords.
Eigenwords are context-oblivious: the vector does not depend
on the word's context, only on the word, but one can also estimate a
context-sensitive vector for each token
(an eigentoken or the state of an HMM, where vector for each
token depends on its context.
-
web eigenwords and (coming soon) eigencontexts
- vocabulary size = 50,000; vector length = 30
- corpus: google web n-grams: (using trigrams)
home: ungar@cis.upenn.edu