Graded Text Analysis

The graded text analyser displays words graded by how frequently they are used in the English language. This type of analysis is useful for selecting vocabulary appropriate for learners, whether native or foreign, and is often used as a guide for writing graded readers (in conjunction with other guides like readability scores and selected grammar patterns).

User Interface:


The Graded Text Analyser compares the words used in the text to the British National Corpus (BNC) or the General Service List (GSL) of vocabulary.

BNC: @14,000 words arranged into 14 levels in order of how common they are in informal, spoken English.

GSL: @2,500 words. The first two levels have about 1,000 words each.

The default level is 1 for both the BNC and the GSL lists, but you can set them to a higher level by using the drop-down box.

How To Analyse a Text:

Most importantly, to start with you should select an appropriate lexicon and level. In Vocabulary Size, Text Coverage and Word Lists, by Paul Nation and Robert Waring, it is suggested that the first 2,000 to 3,000, high frequency words provide a solid vocabulary foundation for a learner, and that mastering these words should be a high priority. Also, texts for learners should be targeted at a level where the learner can understand about 95% of the content. At that level the learner may be able to infer meaning of unknown words and sentences from context.

For this application each level represents about 1,000 words graded by frequency. The two graded lexicons provided are the BNC (British National Corpus) and GSL (General Service List).

The GSL contains 2,000 words and was developed in the 1940's by Micheal West, based on a 5,000,000 word written corpus. Although it is old and contains some errors, it is still widely considered the best available list, largely due to a carefully considered grading criteria. The complete GSL list (GSL level 3 for this application) has seventy-five to ninety percent coverage of written texts (depending on which researcher you ask).

The BNC was created by Dr. Paul Nation using similar criteria to the GSL and is taken from the ten million token spoken section of the British National Corpus. Because it was taken from a spoken corpus it represents a more informal use of English. The proper nouns and interjections from this list were separated and are used as stopwords for both lexica. The first three levels of this list cover about 95% of written texts.

Once you have selected an appropriate lexicon and level, view the text and pick out words flagged as out of level. You can check on the Headwords tab to find out how far out of range they are.

For high frequency words in your text that are flagged, you might want to change for more commonly used synonyms or add them to your syllabus to teach formally. Try to avoid words that are far out of  level or not in either lexicon. Make changes to your text and re-run it through the analyser until you are satisfied it is appropriate for your students.

Contributed by Alex Russell

Graded BNC and GSL graded lexicon courtesy of Dr. Paul Nation.