First load the XTAG system.
The transfer module (all program files) is loaded by the file: /mnt/linc/extra/xtag/work/gprigent/transfer/ld-transfer.lisp, which loads all program files. (All files specification thereafter will be relative the the main directory /mnt/linc/extra/xtag/work/gprigent/, noted by .).
In the current version, before loading the data files, which defines
the transfer data, it is necessary to load the grammars with the xtag
interface.
Data Definition
The database for transfer between two languages is defined in a file which describes all grammars and transfer data files. The database is defined by a macro deftransfer with the syntax:
(deftransfer <language 1> <language 2> <declaration>...)
The syntax for the file is:
<declaration> ::= <parameters declaration> | <files declaration> <parameters declaration> ::= (:default-pathname <default pathname>)
the global default pathname is used for all file specifications.
<files declaration> ::= (<keyword> <file or option>...) <keyword> ::= :grammar-file-1 | ; grammar file for <language 1> :grammar-file-2 | ; grammar file for <language 2> ; for these two keywords, ; only the first file is relevant :family-files | ; family transfer files :tree-files | ; tree transfer files :lexicon-files | ; lexical transfer files :global-files | ; global transfer files :feature-files ; feature transfer files ; see below for the different ; types of transfer data files <file or option> ::= <file-name> | ; file name or pathname (:default-pathname <default pathname> :type <file type>) ; the two options are optional
The file names are interpreted relative to the default pathname and file type, other defaults are provided by the global default pathname above (see lisp function merge-pathnames). The default pathnames can be logical pathnames. The grammar files specify the grammars between which the transfer is defined, <language 1> and <language 2> are the names of the grammars (as defined by defgrammar <name>) and interpreted as case independent strings.
Here is an example: (from file transfer-data/def-franglish.lisp)
(deftransfer francais2 english2 (:default-pathname "/mnt/linc/extra/xtag/work/gprigent/") (:grammar-file-1 "francais2" (:default-pathname "francais/" :type "gram")) (:grammar-file-2 "english2" (:default-pathname "english/" :type "gram")) (:family-files "lexicon" (:default-pathname "transfer-data/" :type "family")) (:tree-files "lexicon" (:default-pathname "transfer-data/" :type "tree")) (:lexicon-files "lexicon" (:default-pathname "transfer-data/" :type "lex")) (:global-files "lexicon" (:default-pathname "transfer-data/" :type "global")) (:feature-files "lexicon" (:default-pathname "transfer-data/" :type "feat")))
With this example, the interpretation of the family files is:
global default pathname: "/mnt/linc/extra/xtag/work/gprigent/" local default pathname: "transfer-data/" file type: "family" file name: "lexicon" ==> file real name: "/mnt/linc/extra/xtag/work/gprigent/transfer-data/lexicon.family"
Note that the data files can be declared in any order, and can also be read
in any order.
Data files syntax
Each type of data file has its own syntax, depending on the type of transfer units defined. In the current implementation, it is not possible to mix in the same file different types of data.
All the files have common features:
When an error occurs while reading a file, an error message is displayed, and generally the current definition unit is discarded. For that purpose, the end of a definition has to be marked by a ".".
By convention in the following notations, a character between " " (as "<") is a terminal character of the language. A string between < > (as <node pair>) is a non-terminal and a string between << >> (as <<tree id>>) is a terminal token, ie a string (sequence of characters) containing no delimiters (ignored or special).
I use | for an alternatives, * for the Kleene star operator (any number), and + for one or more. A terminal on non-terminal between ( ) is optional.
The different syntax share some parts of their language, for
describing node pairs, feature restrictions and sons in a derivation
trees.
Node pairs:
<node pairs> ::= "{" <node pair>* "}" <node pair> ::= <type> <neg> <node desc> "/" <node desc> <type> ::= "" | "f" ":" <neg> ::= "" | "-" <node desc> ::= <<node name>> | <<node name>> "#" <<tree id>>
<type> "f" ":" is specified for node pairs which are relevant for
feature transfer. By default, the relevant node pair for feature
transfer is the head node pair. <neg> "-" is specified to remove an
inherited node pair, for family transfers (see below). Note that in
general, if a node of a pair is not found in a tree, the node pair is
ignored (a warning is displayed, see warnings below).
Feature restrictions:
<features> ::= "{" <feature desc>* "}"
for these features, I use the same syntax as for the equation
specifications in tree definitions in the grammar, with nodes and
top/bottom. Example : {N.b:<agr num>=pl, N.t:<agr>=NP.b<agr>}
Sons in a derivation tree: (complex transfers)
<derivation> ::= "[" <derivation node specification>* "]" <derivation node specification> ::= <<lex entry>> <<tree name>> <option>* (<derivation>) <option> ::= <features> | "#" <<tree id>> | <address> <address> ::= "(" <<number>>* ")"
options : all optional, at most one of each type.
Family files: (keyword :family-files)
The family files describe the transfer between families, from language 1 to language 2. Family names are required to begin with a "T".
The transfer is described by defining tree pairs between tree of each language, the tree names have not to be complete, but enough to identify unambiguously a tree in a family.
A global node pair list is given, and is inherited by all the tree pairs (using "-" if a node pair has to be removed).
Syntax:
<family-unit> ::= <family pair> ":" <keyword> <node pairs> <tree pairs>* "." <family-pair> ::= <<family name>> "/" <<family name>> <keyword> ::= "" | <<key>> ":" <tree pairs> ::= <tree pair>* ":" <node pairs> <tree pair> ::= <tree desc> "/" <tree desc> <tree desc> ::= <<tree name>> (<features>) (<derivation>) <node pairs>, <features> and <derivation> same as above
Two examples, from file transfer-data/lexicon.family, one simple and one including a complex transfer:
Tnx0Vnx1/Tnx0Vnx1: ; no keyword {S_r/S_r, VP/VP, V/V, V/VP, NP_0/NP_0, NP_1/NP_1}, ; global node pairs nx0Vnx1/nx0Vnx1: {}, W0/W0, W1/W1nx0, W1-inv/W1nx0: {S_q/S_q, NP/NP}, R0/R0, R1/R1nx0, R1-inv/R1nx0: {NP_r/NP_r, NP_f/NP_f, S_q/S_q, NP/NP}. Tnx0V/Tnx0Vnx1: {S_r/S_r, VP/VP, V/V, V/VP, NP_0/NP_1}, nx0V/nx1V: {}, W0/W1nx1: {S_q/S_q, NP/NP, V/S_q}, R0/R1nx1: {NP_r/NP_r, NP_f/NP_f, S_q/S_q, NP/NP}, nx1Vnx0 [\FAIRE\ vVP #1 (2)] / nx0Vnx1: {NP_1/NP_0, VP_r#1/VP, f:V#1/V}. ; intransitive : the snow melts / la neige fond ; transitive: the sun melts the snow / le soleil fait fondre la neige
The tree files describe the transfer between individual trees, from language 1 to language 2. Tree names are required to begin either with <alpha> (code character 2) or <beta> (code character 3).
Syntax:
<tree-unit> ::= <tree pair> ":" <keyword> <node pairs> "." <tree pair>, <keyword> and <node pairs> same as above
An example, from file transfer-data/lexicon.tree :
NPdn {N.b:<agr num>=pl} / NXn {N.b:<agr num>=plur} : {NP/NP, N/N}.
The lexicon files describe the transfer between lexical entries in language 1 and language 2. Note that these are NOT morphological classes, but are more related to the meaning (semantic entry). (THIS DISTINCTION IS NOT IMPLEMENTED IN THE CURRENT VERSION OF THE GRAMMAR, AND IS NOT COMPLETELY SPECIFIED. THE CURRENT IMPLEMENTATION OF THE TRANSFER MODULE DOES NOT USE THIS DISTINCTION)
The transfer data for a given lexicalized tree in the source language is found by combining generic tree (or family) transfer data with the specific lexical transfer data, checking the compatibility constraints.
Syntax:
<lexical unit> ::= <lexical entry> "/" <lexical entry> <lex parameters> "." <lexical entry> ::= <<lex entry>> (<<tree or family name>>) (<features>) <lex parameters> ::= "" | ; no parameters ":" <keywords> | ; keywords only ":" <node pairs> | ; node pairs only ":" <keywords> ":" <node pairs> ; keywords and node pairs <keywords> ::= <<key>> | <<key>> ":" <keywords> <node pairs> as above
<<tree or family name>> is interpreted as a constraint on the possible trees for the lexical entry
Some examples, from file transfer-data/lexicon.lex, with keywords or feature restrictions:
\FERMIER\ {N.b:<agr gen>=masc} / \FARMER\. ; feature restrictions \MANQUER\ / \MISS\ :inverse. ; one keyword
With the previous data files (family, tree and lexicon), the transfer data was split in two different parts, one for trees (or families) and one for lexical entries. Some transfer data may not fit this separation, and the purpose of the global data files is to describe a transfer from lexicalized tree to lexicalized tree, globally.
Syntax:
<global unit> ::= <lexicalized entry> "/" <lexicalized entry> ":" <node pairs> "." <lexicalized entry> ::= <<lex entry>> <<tree name>> (<features>) (<derivation>)
<node pairs>, <features> and <derivation> as above.An example, from file transfer-data/lexicon.global, with feature restrictions, a derivation, and no node pairs:
ne negV {V_f.t:<temps>=pres} [pas Ad (3)] / \DO\ vVX {V.b:<neg>=+, V.b:<tense>=pres} : {}.
The purpose of these files is to describe the feature transfer between nodes in a transfer. In the current implementation, the feature are only transfered between heads of trees, and in case of a complex transfer (ie with derivation tree), between heads of the different trees in the derivation.
We define, for each head category correspondence, the associated feature transfer, in terms of path/value to path/value transfer. This is a very simple choice, and may have to be extended in some cases.
Syntax:
<feature unit> ::= <<category>> "/" <<category>> ":" <feature pair>* "." <feature pair> ::= <path-value> "/" <path-value> <path-value> ::= "<" <<path>>+ ">" "=" <<value>>
An example, from file transfer-data/lexicon.feat :
N/N: <agr num> = sing / <agr num> = sing <agr num> = pl / <agr num> = plur <agr gen> = masc / <agr gen> = masc <agr gen> = fem / <agr gen> = fem <agr pers> = 1 / <agr pers> = 1 <agr pers> = 2 / <agr pers> = 2 <agr pers> = 3 / <agr pers> = 3 .
Loading a transfer data definition file, for two given languages, defines a transfer manager, which stores all data transfer information for the transfer between the two languages.
More than one transfer managers can be defined, but only one is
active.
Transfer Application
This part will show how to use the transfer application: from one sentence of one language as input, find all possible sentences in the target language (and their derivation trees).
All functions are defined in and exported from the package :transfer. To use them, either use the package (use-package :transfer) or call the functions with the prefix transfer:<function>.
After loading a data file, the active transfer manager is the transfer manager just created, and the default direction for transfer is from language 1 to language 2 (see above).
To change the transfer manager or the direction of transfer, two functions are used:
define-transfer-languages (source-language target-language)
reverse-transfer-languages ()
The direction of transfer for the active transfer manager is reversed.
The next functions are used to set the transfer parameters and to process a sentence:
set-transfer-parameters (&key (trace :no-change) (warnings :no-change)
(inspect :no-change))
transfer-sentence (&optional sentence &key source target (trace :no-change) (warnings :no-change) (inspect :no-change)) o To translate a given sentence. If no sentence is given, we assume the previous one, if any. o Source and target are used to specify the languages for transfer. o Trace, warnings and inspect are the same as above.
When processing a sentence, or when defining the transfer database,
some warnings may be printed in the lisp window (if the option is
selected). Here are the warnings and their possible interpretations:
Database acquisition
Error in transfer definition file: the keyword <keyword> is unknown (ignored).
in deftansfer, the only known keywords are :default-pathname, :grammar-file-1, :grammar-file-2, :family-files, :tree-files, :lexicon-files, :global-files, :features-files.
Error in definition: <invalid transfer definition unit> <error description>
<invalid transfer definition unit> is the part of the definition which has been read correctly so far, as it is in the file. When such an error occurs, the current definition is discarded.
Different <error description> are possible:
<type> Definition Error: first character <char> not valid.
as defined above, only T is allowed as a first character in family units, and alpha (code char 2) or beta (code char 3) in tree units.
Syntax Error in <fsa> at state <sta>: string: <str>, delimiter: <delim>.
the transitions in the fsa in a certain state are defined according to the string and delimiter read. A syntax error occurs when no transition is possible with the given string and delimiter.
FSA Error in <fsa>: state <state> unknown.
for debugging purposes only, if the fsa is in an unknown state (bug in the fsa).
Feature Error, definition discarded.
the feature fsa is defined elsewhere (feature module of TAG). Its errors are caught by this module (This fsa has its own error messages).
Error in <fsa> at state <state>: string <string>, delimiter: <delimiter> <specific error description>
when reading a file, some specific (semantic) errors can be detected. The only errors detected here are incoherence of the tree ids in a derivation with the tree ids in the node pair list. Different <specific error description> are possible:
Unknown tree id for node <node> <<id>>. No tree id allowed for node <node> <<id>>.
for both error messages, when reading node pairs, if a tree id is found, it has to be in the derivation to which the node pair is linked. In the first case, tree ids are allowed but the tree id found is not. In the second, no tree ids are allowed.
The tree id <id> is already used.
when defining a derivation, no tree can have the same id.
??? Unknown error : <error type>.
for debug only: other types of error-messages are not valid.
In transfer definition <item1> / <item2> <keywords>: <error description> <error action>
when defining the transfer units, as the grammars are already loaded, we can check the existence of trees, families, lexical entries ... In the error messages below, the number <n> between / / is either (for language 1) or 2 (for language 2). Different <error description> are possible:
unknown family name: <name> /<n>/.
the given name is not a family name for the grammar.
no tree for specification <tree name> in family <family name> /<n>/.
in family definitions, the tree names can be reduced to the necessary information to point a tree unambiguously. This error happens when no actual tree has been found in the family.
no transfer tree-pair for tree <tree name> in family <family name> /<n>/.
warns if a tree in the family is not part of any tree pair. Any sentence using this tree cannot be translated.
unknown tree name: <name> /<n>/.
in a tree transfer definition, no tree has been found in the grammar for the given name.
unknown lexical entry: <string> /<n>/.
in lexical and global transfer units, the given string does not exist as a lexical entry (ie morphological class in the current implementation) in the grammar.
unknown lexicalized tree: <lex> [<tree name>] /<n>/.
in global transfer units, or in family or tree transfer units, when a derivation is described, all the combinations lexical entry + tree have to exist in the grammar.
for all those errors, different <error action> are possible, in some cases:
==> transfer unit ignored. ==> tree-pair ignored.
No transfer manager found for languages <source> and <target>.
there is no transfer manager defined with the given source and target languages.
The target language is not defined. The source language is not defined. The source and target languages are not defined.
the source and target languages have both to be specified.
No transfer manager is active.
cannot reverse the direction of transfer if no transfer manager is active. Use define-transfer-languages to set the active transfer manager.
all messages for defining languages, see above.
Transfer Database access: No transfer structure found for elementary tree : <elementary tree>.
no possible transfer structure for the given elementary-tree have been found in the database. Maybe there was none, maybe none of the possible were valid.
Build transfer structure: no valid transfer structure for source-derivation-node: <source derivation node>.
for complex transfer, the match between the actual source derivation tree and the derivation tree defined in the transfer database is completed while building the target derivation. With this error, no database transfer structure has been matched with the source data.
Select node pairs: no relevant node pair found. source-derivation-node: <source derivation node> address: <address>, source-node: <source node> node-pairs: <node pairs>.
In the source derivation, a son of the <source derivation node> was attached at some address, and no node pair can be found to transfer the attachment. This help to find incoherences in the transfer database.
The son <son> of the feature transfered <parent> was not root-adjoined: it is not transfered.
<son> and <parent> are two derivation nodes in the source derivation tree, and the parent is transfered into a feature. If the son is not root adjoined, there is no way to know what to do with it. It is ignored.
Validate target derivation: target derivation tree not valid for structural correctness: <target-derivation-tree>.
while checking for structural correctness, and inconsistency has been found, either two adjunction or two substitutions at the same node. This warnings immediately follow one of the next two, depending on the error:
Validate target derivation: target derivation tree not valid at <derivation-node>: two substitutions at the same node: <son-1> and <son-2>. or two adjunctions at the same node: <son-1> and <son-2>. Validate target derivation: target derivation tree not valid for feature propagation: <target-derivation-tree>.
while propagating the features in the target derivation, a clash has occured, this derivation tree is discarded.
Validate target derivation: target derivation tree not valid for morphological generation: <target-derivation-tree>.
no valid morphological form has been found for each elementary tree.
Validate substitution nodes: no substitution at nodes <subst-nodes> in <target-derivation-node>.
warns that in the target derivation, in some tree, there is a substitution node with no tree substituted. This is not a fatal error, in a future version, the substitution tree could be generated.
Morphological generation: no word found for <lexical entry>, compatible with the derived features.
the feature propagation in the derivation tree has imposed features on the head that are not compatible with any of the possible morphological word: the derivation tree itself is invalid and discarded.
Morphological generation: no solution found.
there is no solution to the morphological generation: due to feature propagation, possible words are excluding other words, and no global solution can be found: the derivation is discarded.
The next two warnings are issued to trace the behavior of the optimization algorithm: signals when a solution is found with a score inferior to the maximal score (0), and when better solutions are found afterwards. They are not really relevant to the user, and could be suppressed.
Solution without maximal score: <global-score>. New solution with better score than previous one (<old score>/<new score>).
This page is a modified version of the original written by Gilles Prigent (prigentg@lannion.cnet.fr). The page is currently maintained by Anoop Sarkar (anoop@linc.cis.upenn.edu).