Next: Structural Matching
Up: The definition of a
Previous: The definition of a
We will use the terms (lhs, rhs and inp) as introduced above
to refer to the parts of a generic metarule being applied to an input tree.
The nodes at lhs can take three
different forms: a constant node, a typed variable node, and a non-typed
variable node. The naming conventions for these different classes of nodes is
given below.
- Constant Node: Its name must not initiate by a question mark
(`?' character). They are like we expect for names to be in normal
XTAG trees; for instance, inp is expected to have only constant
nodes. Some examples of constant nodes are NP, V, NP0, NP1,
Sr. We will call the two parts that compose such names
the stem and the subscript.
In the examples above NP, V and S are stems and
0, 1, r are subscripts. Notice that the
subscript part can also be empty as in two of the above examples.
- Non-Typed Variable Node: Its name initiates by a question
mark (`?'), followed by a sequence of digits (i.e. a number) which
uniquely identifies the variable. Examples: ?1, ?3,
?345225.2. We assume that
there is no stem and no subscript in this names, i.e., `?' is just
a meta-character to introduce a variable, and the number is the
variable identifier.
- Typed Variable Node: Its name initiates by a question mark (`?')
followed by a sequence of digits, but is additionally followed by
a type specifiers definition. A type specifiers definition
is a sequence of one or more type specifier separated by a slash
(`/'). A type specifier has the same form of a regular XTAG node
name (like the constant nodes), except that the subscript can be also
a question mark. Examples of typed variables are:
?1VP (a single type specifier with stem VP and no subscript),
?3NP1/PP (two type specifiers, NP1 and PP),
?1NP? (one type specifier, NP? with undetermined subscript).
We'll see ahead that each type specifier represents an alternative
for matching, and the presence of `?' in subscript position of a
type specifier means that matching will only check for the stem
25.3.
During the process of matching, variables are associated (we use the
term instantiated) with `tree material'. According to its class
a variable can be instantiated with different kinds of tree material:
- A typed variable will be instantiated with exactly one node of
the input tree, which is in accordance to one of its type specifiers
(The full rule is in the following subsection).
- A non-typed variable will be instantiated with a range of subtrees.
These subtrees will be taken from one of the nodes of the input tree
inp. Hence, there will a node n in inp, with subtrees
n.t1, n.t2, ..., n.tk, in this order, where the variable
will be instantiated with some subsequence of these subtrees
(e.g., n.t2, n.t3, n.t4). Note however, that some of these
subtrees, may be incomplete, i.e., they may not go all the way to the
bottom leaves. Entire subtrees may be removed. Actually for each
child of the non-typed variable node, one subtree that matches this
child subtree will be removed from some of the n.ti(maybe an entire
n.ti), leaving in place a mark for inserting material during the
substitution of occurences at rhs.
Notice still that the variable can
be instantiated with a single tree and even with no tree.
We define a match to be a complete instantiation of all variables
appearing in the metarule. In the process of matching, there may be several
possible ways of instantiating the set of variables of the metarule, i.e.,
several possible matches. This is due to the presence of non-typed variables.
Now, we are ready to define what we mean by a successful matching. The process
of matching is successful
if the number of possible matches is greater then 0.
When there is no possible match the process is said to fail.
In addition to return success or failure, the process also return the set of
all possible matches, which will be used for generating the output.
Next: Structural Matching
Up: The definition of a
Previous: The definition of a
XTAG Project
1998-09-14