EMEA – European Medicines Agency

Taxonomy :

This is a parallel corpus made out of PDF documents from the European Medicines Agency.


- Other info -

Language(s) :

Bulgarian
Czech
Danish
German
Greek
English
Spanish
Estonian
Finnish
French
Hungarian
Italian
Lithuanian
Latvian
Maltese
Dutch
Polish
Portuguese
Romanian
Slovak
Slovenian
Swedish

Types : multilingual corpus
Domain : Medicine Healthcare
Size : total number of files: 41,957 total number of tokens: 311,650,000 total number of sentence fragments: 26,510,000
Developer : Jo ̈rg Tiedemann (OPUS)
Availability : Free
Update: 2012