Hi Naveed,

NLTK provides a class named as PunktSentenceTokenizer for sentence split. The iintroduction of it is as following:

Class PunktSentenceTokenizer

A sentence tokenizer which uses an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences; and then uses that model to find sentence boundaries. This approach has been shown to work well for many European languages.

There is some demo code in python:

import nltk.data

tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

fp = open("test.txt")

data = fp.read()

print '\n-----\n'.join(tokenizer.tokenize(data))

I am looking for sentence splitter tool .... can any one help me out regarding this?



