European Corpus Initiative/Multilingual Corpus I (ECI/MCI)

Taxonomy :

The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus, and supports existing and projected national and international efforts to carefully design, collect and publish large-scale multilingual written and spoken corpora. ECI has produced the Multilingual Corpus I (ECI/MCI) of over 98 million words, covering most of the major European languages, as well as Turkish, Japanese, Russian, Chinese, Malay and more. The primary focus in this effort is on textual material of all kinds, including transcriptions of spoken material.


- Other info -

Language(s) :

Swedish
Russian
Norwegian
Uzbek
Portuguese
Turkish
Dutch; Flemish
Czech
Estonian
English
Albanian
Chinese
Bulgarian
Scottish Gaelic
French
Modern Greek (1453-)
German
Japanese
Italian
Lithuanian
Latin
Spanish; Castilian
Malay (macrolanguage)
Danish
Serbian

Types : multilingual corpus
Domain : Newspaper texts; Leiden Corpus; ILO Bulletin
Size : 98,000,000 Words
Developer : European Corpus Initiative project
Availability : Registration required
Update: 05/02/2015