ACCURAT corpus of Wikipedia texts

Taxonomy :

The corpus contains comparable texts from Wikipedia for 12 language pairs.


- Other info -

Language(s) :

English - Croatian
English - Greek
English - Estonian
English - Latvian
English - Lithuanian
English - Romanian
English - Slovenian
Greek - Romanian
Latvian - Lithuanian
Romanian - German
Romanian - Lithuanian
German - English

Types : comparable corpus
Domain : Wikipedia
Size : Each language in a pair has the same size. English - Croatian (22,137 Texts) English - Greek (4,230 Texts) English - Estonian (20,621 Texts) English - Latvian (6,455 Texts) English - Lithuanian (13,906 Texts) English - Romanian (58,622 Texts) English - Slovenian (28,004 Texts) Greek - Romanian (841 Texts) Latvian - Lithuanian (1,541 Texts) Romanian - German (16,246 Texts) Romanian - Lithuanian (2,209 Texts) German - English (149,891 Texts)
Developer : Tilde
Availability : Free
Update: 06/30/2012