Analyzing Large Volume of Text

jbuccola

New member
Joined
Mar 12, 2016
Member Type
Student or Learner
Native Language
American English
Home Country
United States
Current Location
United States
Hello,

I used the 20 included analysis entries in the "Advanced Text" feature on the site; problem is, I have about 8,000 ~500 word essays to analyze. Does anyone know of a tool that would be effective for this purpose?

Ideally, the tool would have analysis aggregated by date, author, etc.

Thanks!
 

jbuccola

New member
Joined
Mar 12, 2016
Member Type
Student or Learner
Native Language
American English
Home Country
United States
Current Location
United States
Hello,

I used the 20 included analysis entries in the "Advanced Text" feature on the site; problem is, I have about 8,000 ~500 word essays to analyze. Does anyone know of a tool that would be effective for this purpose?

Ideally, the tool would have analysis aggregated by date, author, etc.

Thanks!

... or a web service / API I can invoke to get similar analysis and handle the aggregation on my end.
 

jbuccola

New member
Joined
Mar 12, 2016
Member Type
Student or Learner
Native Language
American English
Home Country
United States
Current Location
United States
After much digging, I found Readability-Score.com -- which has a very accessible interface to upload URLs in bulk and even PHP source code if one wanted to do a deeper integration.

I was able to upload thousands of URLs in CSV/Excel format, and the service quickly provided readability scoring in a file emailed to me once processed. The service averaged 4 URLs per second, providing the following fields appended to the provided list:


  • Flesch-Kincaid Reading Ease
  • Flesch-Kincaid Grade Level
  • Gunning-Fog Score
  • Coleman-Liau Index
  • SMOG Index
  • Automated Readability Index
  • Average Grade Level
  • Character Count
  • Syllable Count
  • Word Count
  • Sentence Count
  • Characters per Word
  • Syllables per Word
  • Words per Sentence
  • Letters per Word

datagrab.JPG


I appended a few key fields to allow me to aggregate and sort results (date and author, for example) and am analyzing the data via Microsoft's excellent (and free) PowerBI tool.

Here's an example visualization of various readability indices charted by year (average grade level is the bar graph, Year is the X axis, Grade Level is the Y axis). You can see grade level escalated from 6th grade to 12th+ over the course of time for the same cohort of authors:

readabilityvis.JPG


Thanks to this site for getting me started!
 
Top