UCCA-Annotated French-English Parallel Corpus



This is the corpus (version 1.0) presented in the paper:

Conceptual Annotations Preserve Structure Across Translations: A French-English Case Study.
Elior Sulem, Omri Abend and Ari Rappoport,
ACL 2015 Workshop on Semantics-Driven Statistical Machine Translation (S2MT) (full paper).
[Paper: pdf]

The corpus is released under the Creative Commons Attribution-ShareAlike 3.0 Unported license.

The corpus contains 154 pairs of French-English aligned passages annotated with the UCCA annotation .
Each of the two monolingual parts of the corpus contains 583 sentences which correspond to 12.5K tokens in English and 13.1K tokens in French.
The corpus is based on the first five chapters of "Twenty Thousand Leagues Under the Sea" by Jules Verne. The English translation appears at the Gutenberg project and on Wikisource, and is brought here under the Creative Commons license.

The most updated version can be found here for the English part and here for French