[Corpora-List] querying corpora

maxwell at umiacs.umd.edu maxwell at umiacs.umd.edu
Fri Feb 29 15:52:35 CET 2008

> I was wondering about the kinds of queries you may run on open
> corpora out there
> ...
> Could you, say, run a query asking a corpus to give you the result
> about how many times, where in a sentence (both, as a distribution of
> the number of words, the POS elements used in them and the texts as a
> whole) did Shakespeare use words related to "love" (which you should
> be also able to query even with a certain level of "measurable
> relatedness") modified by an adverb and containing also an adjective
> within the sentence?

In addition to the responses you get from this list, you might look into what the folks over at the ALLC (Association for Literary and Linguistic Computing) and ACH (Association for Computers and the Humanities) are doing. That strikes me as the sort of topic they would be interested in.

> Are there any text corpora out there including phonemes also?

Not sure what you mean here. Are you referring to transcriptions of speech, which might include more or less free variation at the phonemic level (the two pronunciations of 'roof' and 'route'), dialectal variation at the phonemic level (such as whether 'pin' and 'pen' are homophones), or phonemes which cannot be inferred from a pronunciation dictionary (e.g. the present and past tense pronunciations of 'read')?

Mike Maxwell


More information about the Corpora mailing list