[Corpora-List] Special-domain corpora

Paul Buitelaar paulb at dfki.de
Wed Mar 30 12:59:01 CEST 2005


Carlos Rodriguez wrote:


> Hi,

>

> I was wondering if anyone could point me to domain corpora with the

> following characteristics:

>

> 1.- Written texts (ASCII, xml, txt,pdf, no need to be tagged) from

> specialized or technical domains.


If 1 million tokens is ok, you can try the MuchMore corpus of medical
texts (German/English):

http://muchmore.dfki.de/resources1.htm

Cheers,


Paul Buitelaar
DFKI - Language Technology Lab
Saarbruecken, Germany


> 2.- Open source, or reasonably priced, that can be downloaded to be

> processed (web-accesible through proprietary interfaces won't cut it).

> 3.- If possible, with machine-readable or electronic lexicons or

> dictionaries available for the domain represented by the corpora.

>

> I am thinking about experimenting with techniques for lexical

> acquisition.

>

> Thanks and best to all,

>

>

> Carlos Rodríguez

>

>








More information about the Corpora-archive mailing list