EU bookshop

Taxonomy :

Corpus of documents from the EU bookshop – https://bookshop.europa.eu/en/home/


- Other info -

Language(s) :

Bulgarian
Czech
Danish
German
Greek
English
Spanish
Estonian
Finnish
French
Irish
Croatian
Hungarian
Italian
Lithuanian
Latvian
Maltese
Dutch
Polish
Portuguese
Romanian
Slovak
Slovenian
Swedish
Arabic
Byelorussian
Bosnian
Catalan
Chinese
Welch
French_Belgium
Scots Gaelic
Icelandic
Japanese
Luxembourgish
Macedonian
Norwegian
Dutch_Belgium
Russian
Serbo-Croatian
Albanian
Serbian
Swahili
Traditional Chinese
Turkish
Ukranian

Types : multilingual corpus
Domain : Corpus of documents from the EU bookshop
Size : total number of files: 135,785 total number of tokens: 3.60G total number of sentence fragments: 173.20M
Developer : LetsMT project
Availability : Free
Update: 2014