United Nations Parallel Corpus

Taxonomy :

The parallel corpus presented consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages. The purpose of the corpus is to allow access to multilingual language resources and facilitate research and progress in various natural language processing tasks, including machine translation. For convenience, the corpus is also available pre-packaged as language-specific bi-texts and as a six-language parallel corpus subset.


- Other info -

Language(s) :

Six official languages of the United Nations:
Arabic
Chinese
English
French
Russian
Spanish

Types : parallel corpus
Domain : official records and other parliamentary documents of the United Nations that are in the public domain
Size : 799,276 documents
Developer : Michał Ziemski Marcin Junczys-Dowmunt Bruno Pouliquen
Availability : Free
Update: 2016