COCA UPDATE.

Status
Not open for further replies.

5jj

Moderator
Staff member
Joined
Oct 14, 2010
Member Type
English Teacher
Native Language
British English
Home Country
Czech Republic
Current Location
Czech Republic
I am posting this news in this forum, because this is where most COCA-users work. Here is the latests news from Mark Davies:

Monday, 9 April 2012, 15:12
If you are interested in academic English – for teaching or learning – there are two new, free corpus-based resources that might be of interest to you. These are based on the 110 million words of academic texts in the Corpus of Contemporary American English [COCA] (85 million words in academic journals and 25 million words in more academically-oriented magazine articles).



1. The new site www.academicwords.info contains free COCA-based academic wordlists. There are important differences between these lists and the Academic Word List created by Coxhead (2000), and we believe that the lists that these new lists provide better coverage of academic English and that they have a format that better enhances learning and teaching. The three sets of word lists, which have been created in conjunction with Prof. Dee Gardner of BYU, are:

-- Word families (SAMPLE): The top 1000 word families of academic English (with nearly 3000 words total). Unlike the traditional Academic Word List, ours contain separate entries for different parts of speech, so you know, for example, whether abstract is used more as a noun, verb, or adjective. The words are also color-coded to let you know whether the word is a "general" academic word, or whether it is a more "technical" one that occurs in just a few sub-genres. And most importantly, the entries are listed in order of frequency, to help you focus more on words that you will actually see in the real world -- rather than just having a mass of unorganized words in each word family.


-- General “core” academic English (SAMPLE): The top 3500 words (lemmas) in COCA Academic (listed individually, rather than by word family)

-- Technical / sub-genre lists (SAMPLE): The top 1000 words in each of the nine academic sub-genres (Business, Law, Medicine, Science, Humanities, etc)



2. We have created a new interface at www.wordandphrase.info/academic/ for just academic English. It has the same features as the general WordAndPhrase site, but all of the data is based strictly on the 110 million words of academic English in COCA.

-- Frequency listing: Browse through these lists (including word families) to see detailed information (all on one screen, with extensive links between resources): definition, frequency by academic sub-genre (e.g. Medicine, Business, Humanities), synonyms, and collocates and concordance lines (based just on academic English).

-- Input texts: As with the general interface, you can input an entire text (such as a journal article, or an academic paper that you have written) and it will give you detailed information about the words and phrases in the text. You can download word lists based on your text, and you can click on phrases in your text to see related phrases from COCA.

(By the way, if you previously had trouble accessing WordAndPhrase with your account, please try again; we’ve fixed a few bugs.)



I hope that these two new corpus-based resources on academic English will be of interest and value to you for teaching, learning, and research.

Best,

Mark Davies
Brigham Young University
 
Mr Davies just keeps rolling on. :up:
 
Mr Davies just keeps rolling on. :up:
Here's his latest:
We'd like to announce seven new resources and improvements associated with the BYU corpora, which have appeared in the last 3-4 months:

1. 100,000 word list: This is based on COCA, the BNC, SOAP, and COHA, and it is the largest, well-corrected word list of English available anywhere. It allows you to do powerful comparisons of the four corpora offline, as well as see and compare the frequencies in the five main genres in COCA (spoken, fiction, popular magazine, newspaper, and academic), the seven main genres in the BNC, and three main time periods in COHA. In addition, you can click on links in the spreadsheet to see the word in the different corpora online -- either the frequency by genre, or 100-200 sample concordance lines for each word. This is the most powerful and comprehensive word list that we have released yet.

------------------------

2. SOAP corpus: New 100 million word corpus of soap operas from the US from 2001-2012. This contains much more informal language than is found in any of the other corpora.

3. British National Corpus. We recently re-tagged this corpus (using CLAWS 7), which allows for much better comparisons with the other BYU corpora, and we've made several other improvements as well. For more details, click on the yellow asterisk in the header at the BNC website.

4. Side-by-side comparisons between the corpora. Before, although you could click to re-do a query in another corpus, you had to copy the data -- one corpus after another -- into another program to compare the data. Now, you can do a single click to see the data from two corpora -- side by side – in the corpus interface, e.g. dialects (COCA and BNC), current/historical (COCA and COHA), or several genres vs. very informal (e.g. COCA and SOAP). Click here to see a number of interesting side-by-side comparisons of American (COCA) and British (BNC) English.

5. Corpus of Contemporary American English (COCA): We recently updated the corpus to 450 million words, to include texts from 2011 and 2012. Remember that COCA is the only large and balanced corpus of English that continues to be updated, to see ongoing changes in the language.

6. COCA and COHA: We've recently spent a lot of time and effort to eliminate duplicate texts from the 280,000 texts in these two corpora.

7. Google Books: We recently added four new datasets: British English (34 billion words), One Million Books (89 billion words), Fiction (91 billion words), and Spanish (45 billion words). And remember that you can always click on the drop-down list in the results in COCA, COHA, TIME, SOAP, or the BNC to re-do (with one click) your search in the billions of words of texts from Google Books. (Note: take a look at a good introduction to our Google Books interface, and how it compares to the standard, simple Google Books n-grams.)

Also, don't forget WordAndPhrase and AcademicWords, two new resources that were released in the first half of 2012. Remember that at WordAndPhrase, you can now enter entire texts, and see detailed information about the words and phrases in the text, based on COCA data.

Finally, just a reminder that if you have published anything that is based on the corpora in the past 6-9 months, we invite you to take a minute and enter that information.

We hope that these new and expanded resources will be helpful to you in your research and teaching.
Best,
Mark Davies
 
I got the email. That's a lot of improvement. ;-)
 
Status
Not open for further replies.

Ask a Teacher

If you have a question about the English language and would like to ask one of our many English teachers and language experts, please click the button below to let us know:

(Requires Registration)
Back
Top