ESL Web Directory

Corpus Linguistics



Brown Corpus Manual

This Standard Corpus of Present-Day American English consists of 1,014,312 words of running text of edited English prose printed in the United States during the calendar year 1961. So far as it has been possible to determine, the writers were native speakers of American English. Although all of the material first appeared in print in the year 1961, some of it was undoubtedly written earlier. However, no material known to be a second edition or reprint of earlier text has been included.

Centre for English Corpus Linguistics

The UCL Centre for English Corpus Linguistics (CECL) is a specialist research centre with two core areas of research activity: 1. Computer learner corpus research 2. Cross-linguistic research

Dialogue Diversity Corpus

DDC is intended to facilitate all varieties of research that require dialogues from multiple situations as data. For studies of dialogue dynamics, situational effects in dialogue, dialogue coherence, dialogue genre comparison, studies of role and status in dialogue and many other topics, very diverse dialogue data must be brought to bear on single studies.


The Berkeley FrameNet project is a lexicon-building effort in which we (1) study words; (2) describe the frames or conceptual structures which underlie these; (3) examine sentences, using a very large corpus of contemporary English that contains these words; and (4) record the ways in which information from the associated frames are expressed in these sentences.

Michigan Corpus of Academic Spoken English

Welcome to the on-line, searchable part of our collection of transcripts of academic speech events recorded at the University of Michigan. There are currently 152 transcripts (totaling 1,848,364 words) available at this site.

Natural Language and Computational Linguistics

The Natural Language and Computational Linguistics (NLCL) group (part of the Department of Informatics at the University of Sussex) is one of the largest groups in the UK of researchers focusing on statistical and corpus-based approaches to natural language processing.

Online Concordancers

A concordance gives a list of several words, phrases, or distributed structures along with immediate contexts, from a corpus or other collection of texts assembled for language study.

Variation in English words and phrases

This website allows you to quickly and easily search for a wide range of words and phrases of English in the 100 million word British National Corpus. You can search for words and phrases by exact word or phrase, wildcard or part of speech, or combinations of these.

What is Computational Linguistics?

Computational linguistics (CL) is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science aiming at computational models of human cognition. Computational linguistics has applied and theoretical components.

Word Frequency

Word frequency lists and dictionary from the Corpus of Contemporary American English

WordSmith Tools

WordSmith Tools is lexical analysis software for the PC. Published by Oxford University Press since 1996 and now at version 4.0.