[Prev][Next][Index]

summary of reverse indexing




Here is a short summary of the new structure of the lexicon that was discussed 
during today's xtag meeting:

Currently, the features are stored across the tree database and the syntactic 
database:

e.g.: for the betaComps tree, 

    S_r
   /  \
Comp@  S_f*

there are some features that are common across all anchors that are stored 
with the tree itself in the .trees file,

e.g   S_r:<a> = S_f:<a>

in the lexicon with each word there are additional features assigned to the 
tree,

e.g.  that  betaComps  S_r:<assign-comp> = ind
      for   betaComps  S_r:<assign-comp> = inf

The new proposal would change the above to the following:

For the betaComps tree, the features stored *with* the tree will be:

e.g:
S_r:<a> = S_f:<a>
comp-class1 :- S_r:<assign-comp> = ind
comp-class2 :- S_r:<assign-comp> = inf

And now the lexicon will have no features at all, and in fact will not even 
select trees:

e.g:
that comp-class1
for  comp-class2

Such example clusters of words that were generated from the current XTAG 
syntactic database are shown below (this is a fraction of the real output from 
an automatic conversion):

A.1 : A Tnx0A1s1 #A_WH- #S1_ind_that #A_compar-
 
A.2 : A Tnx0A1s1 #A_WH- #S1_ind_that-nil #A_compar-
 
wild : A Tnx0A1s1 #A_WH- #S1_inf_nil #A_compar-
 
curious : A Tnx0A1s1 #S1_ind-inf_whether-if #A_WH- #A_compar-
 
likely : A Tnx0A1s1 Ts0A1s1 #A_WH- #INF_S1_COMP #A_compar-
 
certain : A Tnx0A1s1 Ts0A1s1 #A_WH- #S1_ind_that-nil #A_compar-
 
A.3 : A Tnx0Ax1 #A_WH- #A_compar-

Where, 

A.1 = cheery, devastated, clear, fortunate, mistaken, emotional, ...

A.2 = nervous, positive, sure

A.3 = Eskimo, Minoan, Malaysian, Mandarin, Doric, European, Libyan, Latin, 
IndoEuropean, ...