[Corpora-List] Special-domain corpora
paulb at dfki.de
Wed Mar 30 12:59:01 CEST 2005
Carlos Rodriguez wrote:
> I was wondering if anyone could point me to domain corpora with the
> following characteristics:
> 1.- Written texts (ASCII, xml, txt,pdf, no need to be tagged) from
> specialized or technical domains.
If 1 million tokens is ok, you can try the MuchMore corpus of medical
DFKI - Language Technology Lab
> 2.- Open source, or reasonably priced, that can be downloaded to be
> processed (web-accesible through proprietary interfaces won't cut it).
> 3.- If possible, with machine-readable or electronic lexicons or
> dictionaries available for the domain represented by the corpora.
> I am thinking about experimenting with techniques for lexical
> Thanks and best to all,
> Carlos Rodríguez
More information about the Corpora-archive