Results 1 to 3 of 3

Thread: New Corpus

  1. VIP Member
    Retired English Teacher
    • Member Info
      • Native Language:
      • British English
      • Home Country:
      • Europe
      • Current Location:
      • Czech Republic

    • Join Date: Jul 2015
    • Posts: 14,727
    #1

    New Corpus

    Members who use corpora may be interested in the email I received today:

    As a user of the BYU suite of corpora, you might be interested in the new 14 billion word iWeb corpus, which was just released. In our estimation, iWeb is the most important and exciting corpus from the BYU suite of corpora since COCA was released more than 10 years ago.

    At 14 billion words, iWeb is more than 25 times as large as the 560 million word COCA corpus. iWeb also has a much wider range of web-based materials than does COCA, since it is based on 22 million web pages in nearly 100,000 carefully selected websites (based on Alexa.com, from Amazon).

    New in iWeb is the ability to browse through the top 60,000 words in the corpus, and to search this list by word form, part of speech, rank (#1-60,000), and even pronunciation.

    Most importantly, you can then see detailed information on each of the top 60,000 words in the corpus – definition, frequency information, synonyms and other related words (from WordNet, word families, MRC, etc), collocates (in a much improved format), related “topics” (perhaps much more useful than collocates), “clusters” (new in iWeb), relevant websites, and sample concordance/KWIC lines. There are extensive hyperlinks on each page, which allow you to quickly and easily move from one word to a number of related words.

    In addition, for each of these 60,000 words, there are “quick links” to related data from other websites – pronunciation, additional definitions, images, videos, and translations (for more than 100 languages).

    iWeb also allows you to quickly and easily create “virtual corpora” on nearly any topic, and these virtual corpora can then be searched as their own “stand-alone” corpora, or compared to other virtual corpora that you have created.

    Finally, in terms of “standard” corpus searches, we note that (due to improvements in the corpus architecture) iWeb is faster than any of the other BYU corpora, and in most cases it is also much faster than other large, 10-20 billion word online corpora.

    For a short overview of the corpus (in graphical format, with an emphasis on the new features), please see:

    https://corpus.byu.edu/iweb/help/iweb_overview.pdf

    We hope that this new corpus is useful to you in your teaching, learning, and research.

    Best,

    Mark Davies
    BYU Corpora

    ============================================
    Mark Davies
    Professor of Linguistics / Brigham Young University
    http://davies-linguistics.byu.edu/
    ** Corpus design and use // Linguistic databases **
    ** Historical linguistics // Language variation **
    ** English, Spanish, and Portuguese **
    ============================================






  2. jutfrank's Avatar
    VIP Member
    English Teacher
    • Member Info
      • Native Language:
      • English
      • Home Country:
      • England
      • Current Location:
      • England

    • Join Date: Mar 2014
    • Posts: 7,829
    #2

    Re: New Corpus

    Excellent. I'll have a play later. Thanks for sharing.

  3. teechar's Avatar
    Moderator
    English Teacher
    • Member Info
      • Native Language:
      • English
      • Home Country:
      • Iraq
      • Current Location:
      • Iraq

    • Join Date: Feb 2015
    • Posts: 8,834
    #3

    Re: New Corpus

    Very useful. I have made this a sticky thread.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •