Hi Naveed,
NLTK provides a class named as PunktSentenceTokenizer for sentence split. The iintroduction of it is as following:
Class PunktSentenceTokenizer
A sentence tokenizer which uses an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences; and then uses that model to find sentence boundaries. This approach has been shown to work well for many European languages.
There is some demo code in python:
---------------------------------------------------------------------------- -----
import nltk.data
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("test.txt")
data = fp.read()
print '\n-----\n'.join(tokenizer.tokenize(data))
---------------------------------------------------------------------------- -----
_____
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Afzal, Naveed Sent: Monday, October 29, 2007 5:48 PM To: corpora at uib.no Subject: [Corpora-List] Sentence Splitter tool
I am looking for sentence splitter tool .... can any one help me out regarding this?
Thanks,
Naveed
-------------- next part -------------- An HTML attachment was scrubbed... URL: https://mailman.uib.no/public/corpora/attachments/20071029/df540d65/attachment.html