In addition to the responses you get from this list, you might look into what the folks over at the ALLC (Association for Literary and Linguistic Computing) and ACH (Association for Computers and the Humanities) are doing. That strikes me as the sort of topic they would be interested in.
> Are there any text corpora out there including phonemes also?
Not sure what you mean here. Are you referring to transcriptions of speech, which might include more or less free variation at the phonemic level (the two pronunciations of 'roof' and 'route'), dialectal variation at the phonemic level (such as whether 'pin' and 'pen' are homophones), or phonemes which cannot be inferred from a pronunciation dictionary (e.g. the present and past tense pronunciations of 'read')?
Mike Maxwell
CASL/ U MD