[Corpora-List] help with n-grams

Marc FRYD marc.fryd at univ-poitiers.fr
Sun Oct 26 09:19:26 CET 2008


Hi all, I wonder if anyone could help a linguist with moderate programming abilities with the following task. I am currently working on a corpus of aligned grapheme-to-phoneme isolated words. I would like to produce an N-gram parsing of both levels of data (the graphemic and the phonemic) with a view to extracting trends favouring realisations (i.e. this grapheme will realise as that phoneme with an x rate of occurrence if preceded/followed by such and such graphemes). The db is currently c3000 words, but it will keep growing. Cheers, Marc

-- Dr. Marc FRYD Senior Lecturer in English Linguistics

Faculté des Lettres et des Langues Université de Poitiers 95 avenue du Recteur Pineau 86022, Poitiers, France

Office: 05 49 45 48 11 Cell: 06 76 28 18 50



More information about the Corpora mailing list