Synchronous TAG Documentation

ALGORITHMS AND IMPLEMENTATION

Decoding the transfer data files (file: transfer-lexicon.lisp)

Reading input

The transfer data is described in files. The files are read and decoded by different finite state automatons.

While reading a file, the input is composed of two parts: a string and a delimiter. The string is any sequence of characters which are not delimiters, the delimiter is the first special delimiter that follows the string, ignoring some meaningless delimiters (space, tab, newline and ",").

The transitions in the fsa are defined in terms of these two imputs, string and delimiter, which can be null (not both in the same input): a null string occurs when there are two consecutive special delimiters, a null delimiter when two strings are separated by meaningless delimiters.

A possibility to differ input has been added to the basic input functions, to allow transitions that don't use their input (or part of). The input functions first look for differed input before reading the files.

The main functions for input are:

read-input (stream)
basic read, returns two values: the string and the special delimiter
test-input (stream char)
tests if the first character is the given char. If true, returns t and removes the char from the stream. If not, returns nil.
peek-input (stream)
peeks the first char of the stream
differ-input (delimiter &optional string)
records delimiter and string in the differ input buffer, to be read by the next input function

Errors

When reading the files, errors can be reported (syntax errors especially). To have a better indication of the place where an error occurs in a definition, the definition while being read is echoed on an other stream, which can be read when an error occurs, to show the characters that have been read so far.

FSA

In the descriptions below, the transitions from one state to another are recorded with the possible inputs:

string . a string and no delimiter
stringX . a string and any delimiter
string<char> . a string and the delimiter <char> (ex: string: or string/ ...)
<char> . no string and the delimiter <char>
<nothing> . no conditions: any string, any delimiter

In the right side of the description, after ";" characters, are the interpretations of the strings or the name of another fsa which is called by the transition, in capital letters.

FAMILY DEFINITION FSA:

The family transfer definitions:

       (0) --- string/ ---> (1)         ; family name 1
       (1) --- string: ---> (2)         ; family name 2

       (2) --- string: ---> (3)         ; keyword
       (2) ------ { ------> (4)

       (3) ------ { ------> (4)

       (4) ---------------> (5)         ; NODE PAIRS FSA

       (5) ------ . ------> end
       (5) ---------------> (5)         ; TREE PAIRS FSA

TREE PAIRS FSA and TREE DEFINITION FSA:

The tree pairs and tree definition fsa's:

      (0) ---------------> (1)          ; TREE FSA
      (1) ------ / ------> (2)

      (2) ---------------> (3)          ; TREE FSA

      (3) ------ : ------> (4)
      (3) --- string ----> (0.1)

      (4) --- string: ---> (5)          ; keyword (tree def)
      (4) ------ { ------> end          ; NODE PAIR FSA (family def)
      (4) ------ { ------> (6)          ; NODE PAIR FSA (tree def)

      (5) ------ { ------> (6)          ; NODE PAIR FSA

      (6) ------ . ------> end          ; (in tree def)

TREE FSA:

The fsa for the tree definition file:

      (0) --- string/ ---> end          ; tree name
      (0) --- string: ---> end          ; tree name
      (0) --- string{ ---> (1)          ; tree name + FEATURE FSA
      (0) --- string[ ---> end          ; tree name + DERIVATION FSA
      (0) --- string ----> end          ; tree name

      (1) ------ / ------> end
      (1) ------ : ------> end
      (1) ------ [ ------> end          ; DERIVATION FSA
      (1) --- string ----> end          ; tree name for next tree pair

NODE PAIRS FSA:

The fsa for the node pair mapping file:

      (0) --- string: ---> (0.1)        ; type specification
      (0) --- string/ ---> (2)          ; node 1
      (0) --- string# ---> (1)          ; node 1
      (0) ------ } ------> end

      (0.1) idem (0) except -> (0.1)

      (1) --- string/ ---> (2)          ; tree id for node 1

      (2) --- string ----> (0)          ; node 2, other node pairs
      (2) --- string# ---> (3)          ; node 2
      (2) --- string} ---> end          ; node 2

      (3) --- string ----> (0)          ; tree id for node 2, other node pairs
      (3) --- string} ---> end          ; tree id for node 2

LEXICAL DEFINITION FSA:

The fsa for the lexical transfer file:

      (0) ---------------> (1)          ; LEX ENTRY FSA
      (1) ------ / ------> (2)

      (2) ---------------> (3)          ; LEX ENTRY FSA

      (3) ------ : ------> (4)
      (3) ------ . ------> end

      (4) --- string: ---> (4)          ; keyword
      (4) --- string. ---> end          ; keyword
      (4) ------ { ------> (5)          ; NODE PAIRS FSA

      (5) ------ . ------> end

GLOBAL DEFINITION FSA:

The fsa for the global lexical transfer file:

      (0) ---------------> (1)          ; LEX ENTRY FSA
      (1) ------ / ------> (2)

      (2) ---------------> (3)          ; LEX ENTRY FSA

      (3) ------ : ------> (4)

      (4) ------ { ------> (5)          ; NODE PAIRS FSA

      (5) ------ . ------> end

LEX ENTRY FSA:

The fsa for the transfer of lexical entries:

      (0) --- string ----> (1)          ; lexical entry
      (0) --- string{ ---> end          ; lexical entry + FEATURE FSA (lex def)
      (0) --- string/ ---> end          ; lexical entry
      (0) --- string: ---> end          ; lexical entry
      (0) --- string. ---> end          ; lexical entry

      (1) --- string{ ---> (2)          ; tree/family name + FEATURE FSA
      (1) --- string[ ---> end          ; tree name + DERIVATION FSA (global def)
      (1) --- string/ ---> end          ; tree/family name
      (1) --- string: ---> end          ; tree/family name
      (1) --- string. ---> end          ; tree/family name

      (2) ------ / ------> end
      (2) ------ : ------> end
      (2) ------ . ------> end
      (2) ------ [ ------> end          ; DERIVATION FSA (global def)

DERIVATION FSA:

The fsa to recognize the mapping between derivation trees:

      (0) --- string ----> (1)          ; lexical entry
      (1) --- stringX ---> (2)          ; tree name

      (2) --- string ----> (0)          ; new son
      (2) ------ { ------> (2)          ; FEATURE FSA
      (2) ------ ( ------> (2)          ; ADDRESS FSA
      (2) ------ # ------> (3)
      (2) ------ [ ------> (4)
      (2) ------ ] ------> end

      (3) --- stringX ---> (2)          ; tree id

      (4) --- stringX ---> (0)          ; new son
      (4) ------ ] ------> end

ADDRESS FSA:

The fsa for tree address recognition:

      (0) --- string ----> (0)          ; number in address
      (0) --- string) ---> end          ; last number in address
      (0) ------ ) ------> end          ; null address (root)

FEATURE FSA:

as in equations or lexicon for grammars

SIMPLE FEATURE FSA:

A fsa for simple features:

      (0) --- string/ ---> (1)          ; head category 1
      (1) --- string: ---> (2)          ; head category 2

      (2) ------ . ------> end
      (2) ---------------> (2)          ; FEATURE PAIR FSA

FEATURE PAIR FSA:

A fsa to recognize feature pairs:

      (0) ------ < ------> (1)
      (1) --- string ----> (1)          ; element of path 1
      (1) --- string> ---> (2)          ; element of path 1

      (2) ------ = ------> (3)

      (3) --- string/ ---> (4)          ; value 1

      (4) ------ < ------> (5)

      (5) --- string ----> (5)          ; element of path 2
      (5) --- string> ---> (6)          ; element of path 2

      (6) ------ = ------> (7)

      (7) --- string< ---> end          ; value 2, feature pair next
      (7) --- string. ---> end          ; value 2, end feature pairs

Organization of transfer data and access

(file transfer-manager.lisp)

Given a source derivation node, the transfer data base is searched for all the transfer structures that are compatible with the elementary tree of the derivation node: combination of tree and lexical transfers and global transfers.

The transfer manager stores the transfer data according to their type, and provides the relevant access index:

lexical transfers are indexed by the lexical entries (both languages),
tree transfers are indexed by the tree name (both languages), and grouped according to their target tree and keyword,
global transfers are indexed by the pair lexical entry and tree name (both languages).

Use of keywords:

Keywords are used for the combination of lexical transfers and tree transfers:

With non-lexicalized tree transfers, different transfers can be defined between two trees. For example, in the test data, we have two transfers between some trees of the families Tnx0Vnx1, <alpha>nx0Vnx1 for example.

The first one is a subject/subject and object/object transfer ("love/aimer"), and the second is a subject/object and object/subject transfer (AN EXAMPLE ?, there is "miss/manquer", but it is not the same families).

In such cases, it is necessary to make the two transfers different, as a lexical transfer between entries that belongs to this family pair will have to select one of them. This is the purpose of the keywords.

Each family (or tree) transfer will have ONE keyword associated with it (by default it is the null keyword, for easy use), and the a possibly ambiguous lexical transfer will have to specify some keywords to tell the transfer manager which tree transfers has to be used with this lexical transfer.

Algorithm (function find-possible-transfer-structures):

composite transfers (lex + tree combination) . find in the database the lexical transfers for the lexical entry, and the tree transfers for the tree . for each lexical transfer, if its restrictions are met (feature restrictions, family or tree restrictions) . for each possible target tree . select the tree transfers whose keyword is in the list of keywords provided by the lexical transfer, if its restrictions are met (source features, tree and lex compatibility of target) . if none are found, select the tree transfers with the null keyword (if its restrictions are met, as above)
global transfers . select transfers that fill the eventual feature restrictions for the root (the complete match of the derivation is done elsewhere, see above)

Translation application

The main function for transfer is transfer-sentence, which is defined in the file transfer.lisp.

The translation is done in three successive phases :

parse the source sentence (the standard n6parser is used).
(function parse-sentence, file transfer.lisp)
transfer a source derivation tree into a target derivation lattice, which
represents implicitly all possible target derivation trees for one source derivation tree. (function transfer-derivation-tree, file transfer.lisp)
from a target derivation lattice, generate all derivation trees, find the
morphological heads, and check the complete coherence (features and structural constraints). (function generate-target-derivation-trees, file generation.lisp)
the results are formatted and printed in a last phase.
(function generate-sentences, file sentences.lisp)

Transfer algorithm (file transfer.lisp)

Given a source derivation tree, construct the target derivation lattice from the transfer structures associated with each source derivation node.

function transfer-derivation-tree:

propagate all features in the source derivation tree
(function propagate-features, file feature-deriv.lisp)
recursively along the source derivation tree, built the target derivation
lattice isomorphic to the source derivation tree. (function build-transfer-structures, file transfer.lisp)

function build-transfer-structures:

find all possible transfer structure for the elementary tree of the source
derivation node, and select those that are compatible with the actual source data (feature restrictions, derivation matching for complex transfers). (function find-possible-transfer-structures, file transfer-manager.lisp)
transfer features for each transfer structure found.
(function transfer-features, file feature-transfer.lisp)
for each transfer structure found, and for each transfer structure
found for each son of the derivation node, check the local compatibilities between the parent and son transfer structures, to built the target derivation lattice by linking compatible target derivation nodes at the possible attachment nodes. different functions are used, depending on the type of transfer (simple or complex source, tree or feature target), in file transfer.lisp:

        build-transfer-structure-simple-to-tree,
        build-transfer-structure-complex-to-tree,
        build-transfer-structure-simple-to-feature,
        build-transfer-structure-complex-to-feature,

        see file transfer.lisp for more details.

Generation algorithm (file generation.lisp)

This phase is not really a generation phase, the term expansion phase should be more appropriate: expand the derivation lattice to find the implicit individual derivation trees in the target language.

The lattice is in fact an AND/OR graph, and in the current implementation is systematically traversed to find all solutions. From a node in the target lattice:

the son list is AND (every source son has to have a corresponding target),
for each source son, more than one target is possible (OR), and maybe more than one attachment location is possible for each (OR).

The algorithm is depth-first, top-down, and retrieves one solution at a time (function generate-target-derivation-trees):

an AND between two (or more) branches explores the first branch and adds the others in a waiting list (the argument rest below), with all relevant information (here the son, and a structure to transfer information from the source to the target, function btd-list-of-sons). When a leaf of the and/or tree is reached, we explore an other AND part in the waiting list. When the waiting list is empty, a solution is found (function btd-rest).
an OR is a simple iteration over the branches of the alternative, the new branch erasing the results of a previous one (function btd-son)

This algorithm is very inefficient, does a lot of redundant computations. It could be optimized by storing some intermediate results. Also, probably a bottom-up, breadth-first algorithm would be more efficient, if all complete individual derivation trees would have to be available at the same time, as such an algorithm would construct all solutions in parallel. The implemented one (with some improvements) could be used if some processing was to be done on each derivation tree, which can be discarded afterwards.

Each time a solution is found (function target-derivation-tree-complete), the individual target derivation tree has to be validated:

function validate-target-derivation-tree:

check if the derivation tree is well formed, ie that no two trees are
attached (adjoined or substituted at the same node), and discard the derivation tree if it is not correct. (function check-structural-correctness)
propagate all features in the derivation tree. If a clash occurs, the
derivation tree is discarded. (function propagate-features, file feature-deriv.lisp)
generate the morphological heads for each elementary tree in the derivation
tree. If it is not possible, the derivation tree is discarded. (function morphological-generation, file morpho-gener.lisp)

Morphological generation algorithm (file morpho-gener.lisp)

The goal of the morphological generation is to find the relevant morphological entry (or entries) for each of the derivation nodes, given the features that are requested by the structure of the derivation tree itself and the features that are proposed by transfer from morphological heads in the source language. As dependencies can exist between morphological heads of different nodes, the morphological generation cannot be done independently for each derivation node.

function morphological-generation:

detect all ambiguities (function detect-ambiguities): . for each of the derivation node, find which morphological heads are possible, given the features requested by the structure of the derivation tree. . if the word class has no morphological forms, the word class is OK, . otherwise : . if no morphological form is possible, the tree itself is not valid, . if only one morphological form is possible, this is the morphological head for this node, and propagate its features into the feature network . if more than one morphological form is possible, record the ambiguity
close the list of ambiguities (function check-ambiguity): . check if some of the non-ambiguous words have restricted ambiguous ones due to dependencies. Iterate the process until no more ambiguities are solved
generate all solutions, maximizing the match with transfered features (function transfer-features-for-ambiguities): . for each ambiguity, score each possible word and order them . order the ambiguities to consider first the less ambiguous ones (in terms of the number of words with highest score) . search for all solutions, maximizing the global score

Sentence generation (file sentences.lisp)

After the generation (expansion) phase, we have a set of derivation trees in the target language. Each elementary tree in each derivation tree has a list of the possible words within this derivation. The sentence generation is simply a projection of a derivation tree to a linear sequence of words (heads, terminal leaves of each elementary tree and attachments).

The generation is done by recursively traveling in the derivation tree, starting from the root of the derivation tree (function generate-sentence-internal). For each derivation node, the goal is to place in their relative order all elements that are associated to words: the sons of the derivation node and the heads (including terminals and attachments).

Ordering substituted sons and heads is simple (they are placed according to their address)
Ordering the words associated with adjoined sons is done by finding all sons and heads that are dominated by the adjoined sons and consider them as if they were placed at its foot node. They will be ordered relatively to each other when the foot node of the son will be considered.

Feature propagation in a derivation tree (file feature-deriv.lisp)

The feature propagation has two purposes. The first is to check the validity of the derivation tree by detecting feature clashes, and the second is to maintain all dependencies between features, which is necessary for morphological generation. We use therefore a destructive unification algorithm, to ensure the global dependencies.

The basic (recursive) function is propagate-features-internal, which operates on a derivation node:

merge inherited (parent) features with the local features (function merge-features-with-parent): . if the tree is adjoined, combine top parent features with its top root features and bottom parent features with its bottom foot features. . if the tree is substituted, combine parent features with its root features
propagate features recursively to all sons, transmitting the features of the node of attachment (adjunction or substitution)
at all nodes where no attachment has been made, unify top and bottom

[ Index Page | XTAG Page ]

This page is a modified version of the original written by Gilles Prigent (prigentg@lannion.cnet.fr). The page is currently maintained by Anoop Sarkar (anoop@linc.cis.upenn.edu).