Croatian-English Parallel Web Corpus

Taxonomy :

Croatian-English Parallel Web Corpus is a collection of parallel Croatian-English texts crawled from hrWaC. This corpus was automatically collected by finding online documents in English that parallel to the documents already crawled in hrWaC. The parallelity of texts was calculated and selection treshold empirically set to 0.52 on a scale between 0 and 1. After that, the collection of parallel-text candidates has been manually inspected for real parallel texts.


- Other info -

Language(s) :

Croatian
English

Types : parallel corpus
Domain : translated texts collected from the internet
Size : 99,001 Units
Developer : University of Zagreb, Faculty of Humanities and Social Sciences
Availability : Free
Update: 03/09/2016