[Corpora-List] Free Java Tokenizer for english

Alexandre Rafalovitch arafalov at gmail.com
Thu Nov 20 18:00:26 CET 2008


I don't believe there is a fully consistent agreement on tokenization rules for English (e.g. "don't"), but have a look at: http://www.andy-roberts.net/software/jTokeniser/ and http://www.gate.ac.uk/

Regards,

Alex. Personal blog: http://blog.outerthoughts.com/ Research group: http://www.clt.mq.edu.au/Research/

On Thu, Nov 20, 2008 at 11:41 AM, ben dbabis samira <bendbabis_samira at yahoo.fr> wrote:
> Hi,
> Does anyone knows references of free tokenizers implemented with Java for
> english texts?
> Thanks for help



More information about the Corpora mailing list