EMEA – European Medicines Agency
Taxonomy : Corpus
This is a parallel corpus made out of PDF documents from the European Medicines Agency.
- Other info -
Language(s) :
Bulgarian
Czech
Danish
German
Greek
English
Spanish
Estonian
Finnish
French
Hungarian
Italian
Lithuanian
Latvian
Maltese
Dutch
Polish
Portuguese
Romanian
Slovak
Slovenian
Swedish
Types : multilingual corpus
Domain : Medicine
Healthcare
Size : total number of files: 41,957
total number of tokens: 311,650,000
total number of sentence fragments: 26,510,000
Developer : Jo ̈rg Tiedemann (OPUS)
Availability : Free
Update: 2012