Next: Comparison with IBM
Up: Evaluation and Results
Previous: TSNLP
We evaluated the XTAG parser for the text chunking
task [#!abney91!#]. In particular, we compared NP chunks and verb
group (VG) chunks31.1 produced by the XTAG parser with the NP and VG chunks
from the Penn Treebank [#!marcus93!#]. The test involved 940
sentences of length 15 words or less from sections 17 to 23 of
the Penn Treebank, parsed using the XTAG English grammar. The results
are given in Table G.3.
|
NP Chunking |
VG Chunking |
Recall |
82.15% |
74.51% |
Precision |
83.94% |
76.43% |
- {Text Chunking performance of the XTAG parser
System |
Training Size |
Recall |
Precision |
Ramshaw & Marcus |
Baseline |
81.9% |
78.2% |
Ramshaw & Marcus |
200,000 |
90.7% |
90.5% |
(without lexical information) |
|
|
|
Ramshaw & Marcus |
200,000 |
92.3% |
91.8% |
(with lexical information) |
|
|
|
Supertags |
Baseline |
74.0% |
58.4% |
Supertags |
200,000 |
93.0% |
91.8% |
Supertags |
1,000,000 |
93.8% |
92.5% |
- {Performance comparison of the transformation based noun chunker and the supertag based noun chunker
As described earlier, the results cannot be directly compared with
other results in chunking such as in [#!lance&mitch95!#] since we do
not train from the Treebank before testing. However, in earlier work,
text chunking was done using a technique called
supertagging [#!srini97iwpt!#] (which uses the XTAG English grammar)
which can be used to train from the Treebank. The comparative results
of text chunking between supertagging and other methods of chunking is
shown in Figure G.4.31.2
We also performed experiments to determine the accuracy of the
derivation structures produced by XTAG on WSJ text, where the
derivation tree produced after parsing XTAG is interpreted as a
dependency parse. We took sentences that were 15 words or less from
the Penn Treebank [#!marcus93!#]. The sentences were collected from
sections 17-23 of the Treebank. 9891 of these sentences were
given at least one parse by the XTAG system. Since XTAG typically
produces several derivations for each sentence we simply picked a
single derivation from the list for this evaluation. Better results
might be achieved by ranking the output of the parser using the sort
of approach described in [#!srinietal95!#].
There were some striking differences in the dependencies implicit in
the Treebank and those given by XTAG derivations. For instance, often
a subject NP in the Treebank is linked with the first auxiliary verb
in the tree, either a modal or a copular verb, whereas in the XTAG
derivation, the same NP will be linked to the main verb. Also XTAG
produces some dependencies within an NP, while a large number of words
in NPs in the Treebank are directly dependent on the verb. To
normalize for these facts, we took the output of the NP and VG chunker
described above and accepted as correct any dependencies that were
completely contained within a single chunk.
For example, for the sentence Borrowed shares on the Amex rose to another record, the XTAG and Treebank chunks are shown below.
XTAG chunks:
[Borrowed shares] [on the Amex] [rose]
[to another record]
Treebank chunks:
[Borrowed shares on the Amex] [rose]
[to another record]
Using these chunks, we can normalize for the fact that in the
dependencies produced by XTAG borrowed is dependent on shares (i.e. in the same chunk) while in the Treebank borrowed
is directly dependent on the verb rose. That is to say, we are
looking at links between chunks, not between
words. The dependencies for the sentence are given below.
XTAG dependency Treebank dependency
Borrowed::shares Borrowed::rose
shares::rose shares::rose
on::shares on::shares
the::Amex the::Amex
Amex::on Amex::on
rose::NIL rose::NIL
to::rose to::rose
another::record another::record
record::to record::to
After this normalization, testing simply consisted of counting how
many of the dependency links produced by XTAG matched the Treebank
dependency links. Due to some tokenization and subsequent alignment
problems we could only test on 835 of the original 9891 parsed
sentences. There were a total of 6135 dependency links extracted
from the Treebank. The XTAG parses also produced 6135 dependency
links for the same sentences. Of the dependencies produced by the XTAG
parser, 5165 were correct giving us an accuracy of .
Next: Comparison with IBM
Up: Evaluation and Results
Previous: TSNLP
XTAG Project
http://www.cis.upenn.edu/~xtag