> Hi Jeff,
> if you want to reuse translator's resources (and computer-aided
> translation tools need to have text segmented into sentences), you can use
> SRX standard. I have authored some rules for English, though they are not
> perfect (I have a much better set of rules for Polish). The open-source
> library that supports SRX, segment, is also pretty fast.
In case you're interested in using SRX rules, you may also consider trying our C++ implementation <http://nlp.pwr.wroc.pl/redmine/projects/toki/wiki/>(GNU LGPL). The processing speed in terms of tokens per sec is similar to Marcin Miłkowski's Java segment tool, but if many short texts are to be processed it might be convenient to get rid of Java VM start-up time.
Best, Adam Radziszewski -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1193 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120814/764dd401/attachment.txt>