[Corpora-List] Reducing n-gram output

svetlana sheremetyeva linklana at yahoo.com
Tue Oct 28 11:15:26 CET 2008


Hi, Irina    I have just made a tool  for keyword extraction (LanA-Key)  which includes collapsing n-grams.  It outputs up to 4-grams, but it can be updated to any "n"   The tool can be downloaded for a 3 day free trial from   http://lanaconsult.com

Regards,                     Svetlana Sheremetyeva               

--- On Mon, 10/27/08, Dahlmann Irina <aexid at nottingham.ac.uk> wrote:

From: Dahlmann Irina <aexid at nottingham.ac.uk> Subject: [Corpora-List] Reducing n-gram output To: CORPORA at uib.no Date: Monday, October 27, 2008, 1:07 PM

Dear all,

I was wondering whether anybody is aware of ideas and/or automated processes to reduce n-gram output by solving the common problem that shorter n-grams can be fragments of larger structures (e.g. the 5-gram 'at the end of the' as part of the 6-gram 'at the end of the day')

I am only aware of Paul Rayson's work on c-grams (collapsed-grams).

Many thanks,

Irina Dahlmann

PhD student School of English Studies University of Nottingham aexid at nottingham.ac.uk

This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.

_______________________________________________ Corpora mailing list Corpora at uib.no http://mailman.uib.no/listinfo/corpora

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2093 bytes Desc: not available Url : https://mailman.uib.no/public/corpora/attachments/20081028/5f18232a/attachment.txt



More information about the Corpora mailing list