[Prev][Next][Index]
summary of reverse indexing
-
To: xtag-meeting@cis.upenn.edu
-
Subject: summary of reverse indexing
-
From: Anoop Sarkar <anoop@linc.cis.upenn.edu>
-
Date: Fri, 19 Jul 2002 20:15:56 EDT
-
cc: anoop@linc.cis.upenn.edu
Here is a short summary of the new structure of the lexicon that was discussed
during today's xtag meeting:
Currently, the features are stored across the tree database and the syntactic
database:
e.g.: for the betaComps tree,
S_r
/ \
Comp@ S_f*
there are some features that are common across all anchors that are stored
with the tree itself in the .trees file,
e.g S_r:<a> = S_f:<a>
in the lexicon with each word there are additional features assigned to the
tree,
e.g. that betaComps S_r:<assign-comp> = ind
for betaComps S_r:<assign-comp> = inf
The new proposal would change the above to the following:
For the betaComps tree, the features stored *with* the tree will be:
e.g:
S_r:<a> = S_f:<a>
comp-class1 :- S_r:<assign-comp> = ind
comp-class2 :- S_r:<assign-comp> = inf
And now the lexicon will have no features at all, and in fact will not even
select trees:
e.g:
that comp-class1
for comp-class2
Such example clusters of words that were generated from the current XTAG
syntactic database are shown below (this is a fraction of the real output from
an automatic conversion):
A.1 : A Tnx0A1s1 #A_WH- #S1_ind_that #A_compar-
A.2 : A Tnx0A1s1 #A_WH- #S1_ind_that-nil #A_compar-
wild : A Tnx0A1s1 #A_WH- #S1_inf_nil #A_compar-
curious : A Tnx0A1s1 #S1_ind-inf_whether-if #A_WH- #A_compar-
likely : A Tnx0A1s1 Ts0A1s1 #A_WH- #INF_S1_COMP #A_compar-
certain : A Tnx0A1s1 Ts0A1s1 #A_WH- #S1_ind_that-nil #A_compar-
A.3 : A Tnx0Ax1 #A_WH- #A_compar-
Where,
A.1 = cheery, devastated, clear, fortunate, mistaken, emotional, ...
A.2 = nervous, positive, sure
A.3 = Eskimo, Minoan, Malaysian, Mandarin, Doric, European, Libyan, Latin,
IndoEuropean, ...