I was wondering whether anybody is aware of ideas and/or automated processes to reduce n-gram output by solving the common problem that shorter n-grams can be fragments of larger structures (e.g. the 5-gram 'at the end of the' as part of the 6-gram 'at the end of the day')
I am only aware of Paul Rayson's work on c-grams (collapsed-grams).
PhD student School of English Studies University of Nottingham aexid at nottingham.ac.uk
This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.