iWeb: The Intelligent Web-based Corpus

Taxonomy :

The iWeb corpus contains 14 billion words (about 25 times the size of COCA) in 22 million web pages. Unlike other large corpora from the web, the nearly 95,000 websites in iWeb were chosen in a systematic way, and the websites have an average of 240 web pages and 145,000 words each. You can very easily and quickly focus on specific websites to create “virtual corpora” for any topic, such as buddhism, chocolate, basketball, or nuclear energy.

- Other info -

Language(s) :


Types : monolingual corpus
Domain : web pages
Size : 14 billion
Developer : Mark Davies
Availability : Free
Update: 05/2018