UCCA-Annotated French-English Parallel Corpus
This is the corpus (version 1.0) presented in the paper:
Conceptual Annotations Preserve Structure Across Translations:
A French-English Case Study.
Elior
Sulem, Omri
Abend and Ari
Rappoport,
ACL
2015 Workshop on Semantics-Driven Statistical Machine Translation
(S2MT) (full paper).
[Paper: pdf]
The corpus is released under the Creative Commons Attribution-ShareAlike 3.0 Unported license.
The corpus contains 154 pairs of French-English aligned passages
annotated with the UCCA
annotation .
Each of the two monolingual parts of the corpus
contains 583 sentences which correspond to 12.5K tokens in English
and 13.1K tokens in French.
The corpus is based on the first
five chapters of "Twenty Thousand Leagues Under the Sea" by
Jules Verne. The English translation appears at the Gutenberg project
and on Wikisource, and is brought here under the Creative Commons
license.
The most updated version can be found here for the English part and here for French