[Corpora-List] spanish tokenizer

Maria Esteva mesteva at mail.utexas.edu
Mon Oct 16 15:43:01 CEST 2006


Dear all,

I am a PhD student in the School of Information, University of Texas
at Austin. For my dissertation, I will text mine a large set of
corporate electronic records in Spanish. For this, I need to find an
open source spanish tokenizer, if possible in C++ although other
languages would be fine as well. I am familiar with the Lucene tool
set so if you know about another source where I can find this tool I
will appreciate your help.

Thanks in advance,

Maria Esteva





More information about the Corpora-archive mailing list