[Corpora-List] Penn Treebank annotated with chunks

Steven Bird sb at csse.unimelb.edu.au
Mon Aug 13 23:58:26 CEST 2012


The "tagged" section of Penn Treebank has chunks marked with brackets, e.g.:

[ Pierre/NNP Vinken/NNP ] ,/, [ 61/CD years/NNS ] old/JJ ,/, will/MD join/VB [ the/DT board/NN ] as/IN [ a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD ] ./.

The NLTK corpus readers give access to some chunked corpora: http://nltk.googlecode.com/svn/trunk/doc/howto/corpus.html#chunked-corpora

NLTK doesn't give an interface to the chunked version of the treebank data, but it could be added if there was interest in this.

-Steven Bird

On 13 August 2012 22:52, Aleksandar Savkov <cytehuop at gmail.com> wrote:
> Hello everybody,
> I'm looking for a chunk-annotated version of the Penn Treebank. It seems to
> be the most popular resource for training and testing chunking software, but
> I haven't been able to find a chunked version or an algorithm for extracting
> chunks in a deterministic way. Is there a standard resource that everybody
> uses or does everybody just extract the chunks from the parsed data
> themselves?
> Best,
> Aleksandar Savkov
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

More information about the Corpora mailing list