The freetts project has a Java class which uses a simple state machine mechanism based on a datafile that creates pronunciations for OOV words using the CMU format. The class is easy to run standalone outside the freetts project using the datafile.... the logic is based on a paper which is referenced in the Java class docs, not sure of the details.
I'm using it for OOV that fall outside the standard CMU pronunciation dictionary... and it works fairly well.
http://freetts.sourceforge.net/javadoc/com/sun/speech/freetts/lexicon/LetterToSoundImpl.html
Hope this helps.
d On 24-Apr-08, at 10:56 PM, Madiha Ijaz wrote:
> Dear all,
>
> couple of days back i put a query regarding transcribing English
> text into Urdu and in response received some worthwhile suggestions.
> the one on which i am working right now makes use of CMU
> pronunciation dictionary and it is working fine but OOV still remain
> a problem. one possible solution is to train neural nets or HMM on
> CMU pronunciation dictionary which later on can be used to predict
> pronunciation of unknown words. so i wanted to know if any related
> exercise has been done in this regard or not?
>
> secondly does any pronunciation dictionary (English) exist that
> provides syllabified word transcription instead of just providing
> transcription or any tool that syllabifies English text?
>
> regards
> Madiha
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
David Ayre dave at ayre.ca http://www.gtrlabs.org http://www.linguity.com
-------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.uib.no/mailman/public/corpora/attachments/20080425/f11778de/attachment.html