Chinese Treebank 8.0
Taxonomy : Corpus
Chinese Treebank 8.0 consists of approximately 1.5 million words of annotated and parsed text from Chinese newswire, government documents, magazine articles, various broadcast news and broadcast conversation programs, web newsgroups and weblogs.
- Other info -
Language(s) :
Chinese
Types : monolingual corpus
Domain : newswire, government documents, magazine articles, various broadcast news and broadcast conversation programs, web newsgroups and weblogs
Size : 3,007 text files in this release, containing 71,369 sentences, 1,620,561 words, 2,589,848 characters (hanzi or foreign)
Developer : Nianwen Xue
Xiuhong Zhang
Zixin Jiang
Martha Palmer
Fei Xia
Fu-Dong Chiou
Meiyu Chang
Availability : Registration required
Update: 11/15/2013