[Corpora-List] Readability statistics

John F. Sowa sowa at bestweb.net
Mon Feb 11 07:57:36 CET 2008

Serge and Steve,

Many different kinds of constructions involve closed class words. Some might be easier and some harder to process by a native speaker, a learner, or a computer. And learners whose native languages have very different structures might have different degrees of difficulty.

SS> The only point I disagree with is:

SF>> Sentences of the same length containing more closed class

>> words are likely to be easier to process.

For example, the word 'that' is optional in the following sentences:

This is the house [that] Jack built.

Tom believes [that] the moon is made of green cheese.

Including the word 'that' in such sentences increases the number of closed class words, but it can sometimes speed up the parsing. Without 'that', a parser might interpret 'the moon' as the direct object of 'believes' and switch to a different interpretation when it finds the word 'is'.

Another example might have long chains of noun-noun modifiers, which might be easier to process if some prepositions were added to break up the chains.

There are many examples that have different levels of difficulty for humans and machines. The word 'antidisestablishmentarianism' is a long word that might confuse a human reader, but a computer would immediately recognize it as a noun that has exactly one word sense.

John Sowa

More information about the Corpora mailing list