[Corpora-List] spanish tokenizer

Marco Baroni baroni at sslmit.unibo.it
Mon Oct 16 17:00:04 CEST 2006


The freeling suite includes an open source Spanish tokenizer implemented in
C++:

http://garraf.epsevg.upc.es/freeling/index.php

Regards,

Marco


Maria Esteva wrote:

> Dear all,

>

> I am a PhD student in the School of Information, University of Texas at

> Austin. For my dissertation, I will text mine a large set of corporate

> electronic records in Spanish. For this, I need to find an open source

> spanish tokenizer, if possible in C++ although other languages would be

> fine as well. I am familiar with the Lucene tool set so if you know

> about another source where I can find this tool I will appreciate your

> help.

>

> Thanks in advance,

>

> Maria Esteva

>






More information about the Corpora-archive mailing list