Croatian-English Parallel Web Corpus
Croatian-English Parallel Web Corpus is a collection of parallel Croatian-English texts crawled from hrWaC. This corpus was automatically collected by finding online documents in English that parallel to the documents already crawled in hrWaC. The parallelity of texts was calculated and selection treshold empirically set to 0.52 on a scale between 0 and 1. After that, the collection of parallel-text candidates has been manually inspected for real parallel texts.
- Other info -