This Standard Corpus of Present-Day American English consists of 1,014,312 words of running text of edited English prose printed in the United States during the calendar year 1961. So far as it has been possible to determine, the writers were native speakers of American English. Although all of the material first appeared in print in the year 1961, some of it was undoubtedly written earlier. However, no material known to be a second edition or reprint of earlier text has been included.
The UCL Centre for English Corpus Linguistics (CECL) is a specialist research centre with two core areas of research activity:
1. Computer learner corpus research
2. Cross-linguistic research
DDC is intended to facilitate all varieties of research that require dialogues from multiple situations as data. For studies of dialogue dynamics, situational effects in dialogue, dialogue coherence, dialogue genre comparison, studies of role and status in dialogue and many other topics, very diverse dialogue data must be brought to bear on single studies.
The Berkeley FrameNet project is a lexicon-building effort in which we (1) study words; (2) describe the frames or conceptual structures which underlie these; (3) examine sentences, using a very large corpus of contemporary English that contains these words; and (4) record the ways in which information from the associated frames are expressed in these sentences.
Welcome to the on-line, searchable part of our collection of transcripts of academic speech events recorded at the University of Michigan.
There are currently 152 transcripts (totaling 1,848,364 words) available at this site.
The Natural Language and Computational Linguistics (NLCL) group (part of the Department of Informatics at the University of Sussex) is one of the largest groups in the UK of researchers focusing on statistical and corpus-based approaches to natural language processing.
A concordance gives a list of several words, phrases, or distributed structures along with immediate contexts, from a corpus or other collection of texts assembled for language study.
This website allows you to quickly and easily search for a wide range of words and phrases of English in the 100 million word British National Corpus. You can search for words and phrases by exact word or phrase, wildcard or part of speech, or combinations of these.
Computational linguistics (CL) is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science aiming at computational models of human cognition. Computational linguistics has applied and theoretical components.
Word frequency lists and dictionary from the Corpus of Contemporary American English
WordSmith Tools is lexical analysis software for the PC. Published by Oxford University Press since 1996 and now at version 4.0.