Next: Structural Matching
Up: The definition of a
Previous: The definition of a
We will use the terms lhs, rhs and inp as introduced above
to refer to the parts of a generic metarule being applied to an input tree.
The nodes at lhs can take three
different forms: a constant node, a typed variable node, and a non-typed
variable node. The naming conventions for these different classes of nodes is
given below.
- Constant Node: Its name must not begin with a question mark
(`?' character). They follow the same conventions used in normal
XTAG trees; for instance, inp is expected to have only constant
nodes. Some examples of constant nodes are NP, V, NP0, NP1,
Sr. We will call the two parts that compose such names
the stem and the subscript.
In the examples above NP, V and S are stems and
0, 1, r are subscripts. Notice that the
subscript part can also be empty.
- Non-Typed Variable Node: Its name begins with a question
mark (`?'), followed by a sequence of digits (i.e., a number) which
uniquely identifies the variable. Examples: ?1, ?3,
?3452.27.2 There is no stem and no subscript in these
names, i.e., `?' is just a meta-character to introduce a variable, and
the number is the variable identifier.
- Typed Variable Node: Its name begins with a question mark (`?')
followed by a sequence of digits, but is additionally followed by a
type specifiers definition. A type specifiers definition is
a sequence of one or more type specifier separated by a slash
(`/'). A type specifier has the same form of a regular XTAG node
name (like the constant nodes), except that the subscript can be also a
question mark. Examples of typed variables are: ?1VP (a single type
specifier with stem VP and no subscript), ?3NP1/PP (two type
specifiers, NP1 and PP), ?1NP? (one type specifier, NP? with
undetermined subscript). Each type specifier represents an alternative
for matching, and the presence of `?' in subscript position of a type
specifier means that matching will only check for the stem
27.3.
During the process of matching, variables are associated (we use the
term instantiated) with `tree material'. According to its class
a variable may be instantiated with different kinds of tree material:
- A typed variable will be instantiated with exactly one node of
the input tree, which is in accordance to one of its type specifiers
(The full rule is in the following subsection).
- A non-typed variable will be instantiated by a sequence of subtrees.
These subtrees will be taken from one of the nodes of the input tree
inp. Hence, there will be a node n in inp, with subtrees
n.t1, n.t2, ..., n.tk, in this order, where the variable
will be instantiated with some subsequence of these subtrees
(e.g., n.t2, n.t3, n.t4). Note however, that some of these
subtrees, may be incomplete, i.e., they may not go all the way to the
bottom leaves. Entire subtrees may be removed. Actually for each
child of the non-typed variable node, one subtree that matches this
child subtree will be removed from some of the n.ti(maybe an entire
n.ti), leaving in place a mark for inserting material during the
substitution of occurrences at rhs.
Notice still that the variable may
be instantiated with a single tree and even with no tree.
We define a match to be a complete instantiation of all variables
appearing in the metarule. In the process of matching, there may be several
possible ways of instantiating the set of variables of the metarule, i.e.,
several possible matches. This is due to the presence of non-typed variables.
Now, we are ready to define what we mean by a successful matching. The process
of matching is successful
if the number of possible matches is greater then 0.
When there is no possible match the process is said to fail.
In addition to returning success or failure, the
matching process also returns the set of
all possible matches, which will be used for generating the output.
Next: Structural Matching
Up: The definition of a
Previous: The definition of a
XTAG Project
http://www.cis.upenn.edu/~xtag