The transfer data is described in files. The files are read and decoded by different finite state automatons.
While reading a file, the input is composed of two parts: a string and a delimiter. The string is any sequence of characters which are not delimiters, the delimiter is the first special delimiter that follows the string, ignoring some meaningless delimiters (space, tab, newline and ",").
The transitions in the fsa are defined in terms of these two imputs, string and delimiter, which can be null (not both in the same input): a null string occurs when there are two consecutive special delimiters, a null delimiter when two strings are separated by meaningless delimiters.
A possibility to differ input has been added to the basic input functions, to allow transitions that don't use their input (or part of). The input functions first look for differed input before reading the files.
The main functions for input are:
basic read, returns two values: the string and the special delimiter
tests if the first character is the given char. If true, returns t and removes the char from the stream. If not, returns nil.
peeks the first char of the stream
records delimiter and string in the differ input buffer, to be read by the next input function
When reading the files, errors can be reported (syntax errors
especially). To have a better indication of the place where an error
occurs in a definition, the definition while being read is echoed on
an other stream, which can be read when an error occurs, to show the
characters that have been read so far.
FSA
In the descriptions below, the transitions from one state to another are recorded with the possible inputs:
In the right side of the description, after ";" characters, are the
interpretations of the strings or the name of another fsa which is
called by the transition, in capital letters.
FAMILY DEFINITION FSA:
The family transfer definitions:
(0) --- string/ ---> (1) ; family name 1 (1) --- string: ---> (2) ; family name 2 (2) --- string: ---> (3) ; keyword (2) ------ { ------> (4) (3) ------ { ------> (4) (4) ---------------> (5) ; NODE PAIRS FSA (5) ------ . ------> end (5) ---------------> (5) ; TREE PAIRS FSA
The tree pairs and tree definition fsa's:
(0) ---------------> (1) ; TREE FSA (1) ------ / ------> (2) (2) ---------------> (3) ; TREE FSA (3) ------ : ------> (4) (3) --- string ----> (0.1) (4) --- string: ---> (5) ; keyword (tree def) (4) ------ { ------> end ; NODE PAIR FSA (family def) (4) ------ { ------> (6) ; NODE PAIR FSA (tree def) (5) ------ { ------> (6) ; NODE PAIR FSA (6) ------ . ------> end ; (in tree def)
The fsa for the tree definition file:
(0) --- string/ ---> end ; tree name (0) --- string: ---> end ; tree name (0) --- string{ ---> (1) ; tree name + FEATURE FSA (0) --- string[ ---> end ; tree name + DERIVATION FSA (0) --- string ----> end ; tree name (1) ------ / ------> end (1) ------ : ------> end (1) ------ [ ------> end ; DERIVATION FSA (1) --- string ----> end ; tree name for next tree pair
The fsa for the node pair mapping file:
(0) --- string: ---> (0.1) ; type specification (0) --- string/ ---> (2) ; node 1 (0) --- string# ---> (1) ; node 1 (0) ------ } ------> end (0.1) idem (0) except -> (0.1) (1) --- string/ ---> (2) ; tree id for node 1 (2) --- string ----> (0) ; node 2, other node pairs (2) --- string# ---> (3) ; node 2 (2) --- string} ---> end ; node 2 (3) --- string ----> (0) ; tree id for node 2, other node pairs (3) --- string} ---> end ; tree id for node 2
The fsa for the lexical transfer file:
(0) ---------------> (1) ; LEX ENTRY FSA (1) ------ / ------> (2) (2) ---------------> (3) ; LEX ENTRY FSA (3) ------ : ------> (4) (3) ------ . ------> end (4) --- string: ---> (4) ; keyword (4) --- string. ---> end ; keyword (4) ------ { ------> (5) ; NODE PAIRS FSA (5) ------ . ------> end
The fsa for the global lexical transfer file:
(0) ---------------> (1) ; LEX ENTRY FSA (1) ------ / ------> (2) (2) ---------------> (3) ; LEX ENTRY FSA (3) ------ : ------> (4) (4) ------ { ------> (5) ; NODE PAIRS FSA (5) ------ . ------> end
The fsa for the transfer of lexical entries:
(0) --- string ----> (1) ; lexical entry (0) --- string{ ---> end ; lexical entry + FEATURE FSA (lex def) (0) --- string/ ---> end ; lexical entry (0) --- string: ---> end ; lexical entry (0) --- string. ---> end ; lexical entry (1) --- string{ ---> (2) ; tree/family name + FEATURE FSA (1) --- string[ ---> end ; tree name + DERIVATION FSA (global def) (1) --- string/ ---> end ; tree/family name (1) --- string: ---> end ; tree/family name (1) --- string. ---> end ; tree/family name (2) ------ / ------> end (2) ------ : ------> end (2) ------ . ------> end (2) ------ [ ------> end ; DERIVATION FSA (global def)
The fsa to recognize the mapping between derivation trees:
(0) --- string ----> (1) ; lexical entry (1) --- stringX ---> (2) ; tree name (2) --- string ----> (0) ; new son (2) ------ { ------> (2) ; FEATURE FSA (2) ------ ( ------> (2) ; ADDRESS FSA (2) ------ # ------> (3) (2) ------ [ ------> (4) (2) ------ ] ------> end (3) --- stringX ---> (2) ; tree id (4) --- stringX ---> (0) ; new son (4) ------ ] ------> end
The fsa for tree address recognition:
(0) --- string ----> (0) ; number in address (0) --- string) ---> end ; last number in address (0) ------ ) ------> end ; null address (root)
as in equations or lexicon for grammars
SIMPLE FEATURE FSA:
A fsa for simple features:
(0) --- string/ ---> (1) ; head category 1 (1) --- string: ---> (2) ; head category 2 (2) ------ . ------> end (2) ---------------> (2) ; FEATURE PAIR FSA
A fsa to recognize feature pairs:
(0) ------ < ------> (1) (1) --- string ----> (1) ; element of path 1 (1) --- string> ---> (2) ; element of path 1 (2) ------ = ------> (3) (3) --- string/ ---> (4) ; value 1 (4) ------ < ------> (5) (5) --- string ----> (5) ; element of path 2 (5) --- string> ---> (6) ; element of path 2 (6) ------ = ------> (7) (7) --- string< ---> end ; value 2, feature pair next (7) --- string. ---> end ; value 2, end feature pairs
(file transfer-manager.lisp)
Given a source derivation node, the transfer data base is searched for all the transfer structures that are compatible with the elementary tree of the derivation node: combination of tree and lexical transfers and global transfers.
The transfer manager stores the transfer data according to their type, and provides the relevant access index:
Keywords are used for the combination of lexical transfers and tree transfers:
With non-lexicalized tree transfers, different transfers can be defined between two trees. For example, in the test data, we have two transfers between some trees of the families Tnx0Vnx1, <alpha>nx0Vnx1 for example.
The first one is a subject/subject and object/object transfer ("love/aimer"), and the second is a subject/object and object/subject transfer (AN EXAMPLE ?, there is "miss/manquer", but it is not the same families).
In such cases, it is necessary to make the two transfers different, as a lexical transfer between entries that belongs to this family pair will have to select one of them. This is the purpose of the keywords.
Each family (or tree) transfer will have ONE keyword associated with
it (by default it is the null keyword, for easy use), and the a
possibly ambiguous lexical transfer will have to specify some keywords
to tell the transfer manager which tree transfers has to be used with
this lexical transfer.
Algorithm (function find-possible-transfer-structures):
The main function for transfer is transfer-sentence, which is defined in the file transfer.lisp.
The translation is done in three successive phases :
(function parse-sentence, file transfer.lisp)
represents implicitly all possible target derivation trees for one source derivation tree. (function transfer-derivation-tree, file transfer.lisp)
morphological heads, and check the complete coherence (features and structural constraints). (function generate-target-derivation-trees, file generation.lisp)
(function generate-sentences, file sentences.lisp)
Given a source derivation tree, construct the target derivation lattice from the transfer structures associated with each source derivation node.
function transfer-derivation-tree:
(function propagate-features, file feature-deriv.lisp)
lattice isomorphic to the source derivation tree. (function build-transfer-structures, file transfer.lisp)
function build-transfer-structures:
derivation node, and select those that are compatible with the actual source data (feature restrictions, derivation matching for complex transfers). (function find-possible-transfer-structures, file transfer-manager.lisp)
(function transfer-features, file feature-transfer.lisp)
found for each son of the derivation node, check the local compatibilities between the parent and son transfer structures, to built the target derivation lattice by linking compatible target derivation nodes at the possible attachment nodes. different functions are used, depending on the type of transfer (simple or complex source, tree or feature target), in file transfer.lisp:
build-transfer-structure-simple-to-tree, build-transfer-structure-complex-to-tree, build-transfer-structure-simple-to-feature, build-transfer-structure-complex-to-feature, see file transfer.lisp for more details.
This phase is not really a generation phase, the term expansion phase should be more appropriate: expand the derivation lattice to find the implicit individual derivation trees in the target language.
The lattice is in fact an AND/OR graph, and in the current implementation is systematically traversed to find all solutions. From a node in the target lattice:
The algorithm is depth-first, top-down, and retrieves one solution at a time (function generate-target-derivation-trees):
This algorithm is very inefficient, does a lot of redundant computations. It could be optimized by storing some intermediate results. Also, probably a bottom-up, breadth-first algorithm would be more efficient, if all complete individual derivation trees would have to be available at the same time, as such an algorithm would construct all solutions in parallel. The implemented one (with some improvements) could be used if some processing was to be done on each derivation tree, which can be discarded afterwards.
Each time a solution is found (function target-derivation-tree-complete), the individual target derivation tree has to be validated:
function validate-target-derivation-tree:
attached (adjoined or substituted at the same node), and discard the derivation tree if it is not correct. (function check-structural-correctness)
derivation tree is discarded. (function propagate-features, file feature-deriv.lisp)
tree. If it is not possible, the derivation tree is discarded. (function morphological-generation, file morpho-gener.lisp)
The goal of the morphological generation is to find the relevant morphological entry (or entries) for each of the derivation nodes, given the features that are requested by the structure of the derivation tree itself and the features that are proposed by transfer from morphological heads in the source language. As dependencies can exist between morphological heads of different nodes, the morphological generation cannot be done independently for each derivation node.
function morphological-generation:
After the generation (expansion) phase, we have a set of derivation trees in the target language. Each elementary tree in each derivation tree has a list of the possible words within this derivation. The sentence generation is simply a projection of a derivation tree to a linear sequence of words (heads, terminal leaves of each elementary tree and attachments).
The generation is done by recursively traveling in the derivation tree, starting from the root of the derivation tree (function generate-sentence-internal). For each derivation node, the goal is to place in their relative order all elements that are associated to words: the sons of the derivation node and the heads (including terminals and attachments).
The feature propagation has two purposes. The first is to check the validity of the derivation tree by detecting feature clashes, and the second is to maintain all dependencies between features, which is necessary for morphological generation. We use therefore a destructive unification algorithm, to ensure the global dependencies.
The basic (recursive) function is propagate-features-internal, which operates on a derivation node:
This page is a modified version of the original written by Gilles Prigent (prigentg@lannion.cnet.fr). The page is currently maintained by Anoop Sarkar (anoop@linc.cis.upenn.edu).