Chinese Treebank 8.0

Taxonomy :

Chinese Treebank 8.0 consists of approximately 1.5 million words of annotated and parsed text from Chinese newswire, government documents, magazine articles, various broadcast news and broadcast conversation programs, web newsgroups and weblogs.


- Other info -

Language(s) :

Chinese

Types : monolingual corpus
Domain : newswire, government documents, magazine articles, various broadcast news and broadcast conversation programs, web newsgroups and weblogs
Size : 3,007 text files in this release, containing 71,369 sentences, 1,620,561 words, 2,589,848 characters (hanzi or foreign)
Developer : Nianwen Xue Xiuhong Zhang Zixin Jiang Martha Palmer Fei Xia Fu-Dong Chiou Meiyu Chang
Availability : Registration required
Update: 11/15/2013