My application involves narrative and informational texts at a variety of reading levels and genres. Most text is hand-edited to eliminate non-prose content but any system that could respond robustly to unedited text would be awesome, of course.
Mostly we've been using hand-crafted tools written in Python. I have checked out what NLTK offers but from what I've seen there's not anything terribly accurate in it (fails on obvious common cases like some honorifics). We did develop a decision tree based model using Weka for Spanish text. I'd be happy to do this again for English but wanted to see if there's something good already out there.
Thanks in advance! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 845 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120813/4e5853f9/attachment.txt>