The "tagged" section of Penn Treebank has chunks marked with brackets, e.g.:
[ Pierre/NNP Vinken/NNP ] ,/, [ 61/CD years/NNS ] old/JJ ,/, will/MD join/VB [ the/DT board/NN ] as/IN [ a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD ] ./.
The NLTK corpus readers give access to some chunked corpora: http://nltk.googlecode.com/svn/trunk/doc/howto/corpus.html#chunked-corpora
NLTK doesn't give an interface to the chunked version of the treebank data, but it could be added if there was interest in this.
On 13 August 2012 22:52, Aleksandar Savkov <cytehuop at gmail.com> wrote:
> Hello everybody,
> I'm looking for a chunk-annotated version of the Penn Treebank. It seems to
> be the most popular resource for training and testing chunking software, but
> I haven't been able to find a chunked version or an algorithm for extracting
> chunks in a deterministic way. Is there a standard resource that everybody
> uses or does everybody just extract the chunks from the parsed data
> Aleksandar Savkov
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no