LTAG-spinal: Treebank and parsers
A new resource for incremental,
dependency and semantic parsing
|
|
LTAG-spinal is a novel
variant of traditional Lexicalized Tree
Adjoining Grammar (LTAG) with desirable linguistic, computational and
statistical properties. Unlike in traditional LTAG, subcategorization
frames and the argument-adjunct distinction are left underspecified in
LTAG-spinal. With adjunction constraints, this formalism is weakly
equivalent to LTAG. LTAG-spinal provides a desirable resource for
statistical LTAG parsing, incremental parsing, dependency parsing, and
semantic parsing.
This page is the companion website to the following dissertation:
Statistical
LTAG Parsing. Libin
Shen (2006). Ph.D. thesis. PDF.
For more recent information, please refer to the papers linked in the
following.
Jump directly down
to:
LTAG-spinal Treebank
We extracted an
LTAG-spinal Treebank from the Penn Treebank and
harmonized it with the PropBank.
Based on Propbank annotation, we successfully extracted predicate
coordination and LTAG adjunction structures. The LTAG-spinal Treebank
makes explicit semantic relations that are implicit or absent from the
original Penn Treebank.
- Read the paper:
LTAG-spinal
and the Treebank: A new resource for incremental, dependency and
semantic
parsing. Libin Shen, Lucas Champollion, and Aravind K. Joshi (in
press). Language Resources and Evaluation. Online
[restricted]
at
SpringerLink. Pre-print version: PDF.
- Download the treebank [.zip, 18 MB]
- For copyright reasons, this public release does not include the
file with the Propbank annotations. Please contact us for a
license to the Propbank file.
- The format of this treebank is documented in Libin Shen's
thesis, Section 5.4.1, p.71 [pdf].
- View treebank README file [.txt]
- View sample treebank sentence [.txt, .jpg]
- Download visualizations of the
treebank
- We have prepared three different versions of the treebank
converted to a graphical format. The
files are in .dot format
to save space
and can be converted to any of the usual formats using Graphviz. The
treebank API provides Java methods as well as a command-line tool to
produce
this graphical output.
|
|
"Beanpoles" format [ .zip,
65 MB]
Click on the images for a
larger view.
|
LTAG-spinal Parsers
We have used this treebank to train two
novel statistical incremental parsers, a left-to-right parser that
produces full LTAG-spinal annotation, and a bidirectional parser that
produces derivation trees without spines (similarly to a dependency
parser). Both achieve competitive
results on our treebank, with the latter significantly improving over
the former. As far as we know, these parsers are the first
comprehensive attempt of efficient statistical parsing with a formal
grammar with provably stronger generative power than CFG.
- Left-to-right incremental
LTAG-spinal parser
- Read the paper: Incremental
LTAG Parsing. Libin
Shen and Aravind K. Joshi (2005). In: Proceedings of HLT/EMNLP.
Vancouver, Canada. Oct. 6-8, 2005. [.pdf]
- Download ready-to-launch application [.zip,
79 MB]
- View README file [.txt]
- View sample output [.txt,
.jpg]
- Bidirectional incremental
LTAG-spinal parser
- Read the paper: Bidirectional
LTAG Dependency
Parsing. Libin Shen and Aravind K. Joshi (2006). Manuscript. [.pdf]
- Download ready-to-launch application [.zip, 44 MB]
- View README file [.txt]
- View sample output [.txt, .jpg]
We have also developed a POS tagger using the bidirectional search
strategy. The output of this POS tagger can be used as the input to the
parsers after a simple tag mapping. (The POS tagger is trained on the CoNLL
standard data set, so that we need to map (
to
LRB
and )
to RRB
to make it compatible with the Penn Treebank and LTAG-spinal treebank annotation.)
New! Java
API
We have developed a comprehensive API in the Java programming
language (compatible with Java 1.4 or higher). The API provides full
read
access to the data structures of the LTAG-spinal treebank, the modified
version of the Propbank, as well as the output of the two parsers. The
API is licensed under GNU
GPL v.3. Please contact us if this license does not meet your
particular needs.
- Download the API [.jar]
and put it into your classpath for immediate use.
- Download the source code [.zip].
- Download the documentation [.zip].
- Get it all in
one file: source code, jar, and documentation [.zip].
Acknowledgements
We are grateful to Ryan
Gabbard,
who has contributed to the code for the LTAG-spinal API. We thank Martha Palmer for
generously providing the Propbank API (originally written by Scott
Cotton) for us to include in our treebank API. We also thank Julia Hockenmaier, Mark Johnson, Yudong Liu, Mitch Marcus, Sameer Pradhan, Anoop Sarkar, and the CLRG
and XTAG groups at Penn
for
helpful discussions.
Contact us
For all inquiries, feel free to contact Lucas Champollion:
[ XTAG Main Page ]
Page maintained by Lucas
Champollion
Last modified: 12/23/2007