Digital Corpus of the European Parliament (DCEP)

Taxonomy :

The Digital Corpus of the European Parliament (DCEP) contains the majority of the documents published on the European Parliament’s official website. It comprises a variety of document types, from press releases to session and legislative documents related to European Parliament’s activities and bodies. The current version of the corpus contains documents that were produced between 2001 and 2012.


- Other info -

Language(s) :

Bulgarian
Spanish
Czech
Danish
German
Estonian
Greek
English
French
Irish
Croatian
Italian
Latvian
Lithuanian
Hungarian
Maltese
Dutch
Polish
Portuguese
Romanian
Slovak
Slovenian
Finnish
Swedish

Types : multilingual corpus
Domain : European Parliament's documents
Size : Total number of documents : 1.5 million Total number of words: 1.37 billion Total number of English segments: 7.7 million The best-represented language in terms of number of words is English (103,458,996); French and Spanish miss less than 10%.
Developer : Machine Translation team of the European Parliament's Directorate-General for Translation (DGTRAD)
Availability : Free
Update: 03/11/2015