[Corpora-List] Readability statistics

Serge Sharoff S.Sharoff at leeds.ac.uk
Wed Feb 6 08:46:12 CET 2008


Thanks for an excellent overview on the readability. The only point I disagree with is:
> (3) Sentences of the same length containing more closed class words are likely
> to be easier to process. (NB - high correlation w/ (2))
Add: 'for the native speaker'. We experimented with finding texts suitable for language learners and the advice from language teachers is that 'closed class words are polluters in frequency lists' (common words derived from frequency lists are indeed a good approximation of the reading difficulty, but also not 100%). The reasoning is that closed class words indicate grammatical constructions. Their greater number means that such constructions are more tightly packed in a sentence and, hence, its structure is more difficult to decode for a language learner.

> but they are used by some publishers, and hence are likely to have some
> empirical basis (although I have not seen the science). What relation they
some of them are derived empirically, for an introduction see http://en.wikipedia.org/wiki/Readability_survey

However, AFAIK, all this work has been done with native speakers in mind, so I'm in interested in any hints on automated readability tests for language learners (preferably for a wide range of languages, e.g. Chinese or Russian, my guess is that the word length does not correlate with readabilty in Chinese, while in Russian irregular word endings might present a greater problem than predictable longer words).


More information about the Corpora mailing list