[Corpora-List] spanish tokenizer

Marco Baroni baroni at sslmit.unibo.it
Mon Oct 16 17:00:04 CEST 2006

The freeling suite includes an open source Spanish tokenizer implemented in




Maria Esteva wrote:

> Dear all,


> I am a PhD student in the School of Information, University of Texas at

> Austin. For my dissertation, I will text mine a large set of corporate

> electronic records in Spanish. For this, I need to find an open source

> spanish tokenizer, if possible in C++ although other languages would be

> fine as well. I am familiar with the Lucene tool set so if you know

> about another source where I can find this tool I will appreciate your

> help.


> Thanks in advance,


> Maria Esteva


More information about the Corpora-archive mailing list