[Corpora-List] Penn Treebank annotated with chunks

Thomas Proisl tsproisl at linguistik.uni-erlangen.de
Mon Aug 27 10:43:22 CEST 2012

Hi Aleksandar,

there is a Perl script by Sabine Buchholz that can convert parsed sentences into chunks. It has been used to generate the data for the CoNLL-2000 Shared Task on chunking.


Best regards, Thomas

Am Mon, 13 Aug 2012 13:52:08 +0100 schrieb Aleksandar Savkov <cytehuop at gmail.com>:

> Hello everybody,
> I'm looking for a chunk-annotated version of the Penn Treebank. It
> seems to be the most popular resource for training and testing
> chunking software, but I haven't been able to find a chunked version
> or an algorithm for extracting chunks in a deterministic way. Is
> there a standard resource that everybody uses or does everybody just
> extract the chunks from the parsed data themselves?
> Best,
> Aleksandar Savkov

-- Department Germanistik und Komparatistik Professur für Computerlinguistik Bismarckstr. 6, 91054 Erlangen

Institut für Anglistik und Amerikanistik Lehrstuhl für Anglistik, insbesondere Linguistik Bismarckstr. 1, 91054 Erlangen

Fon: +49 9131 85-25908; Fax: +49 9131 85-29251 http://www.linguistik.uni-erlangen.de/~tsproisl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120827/2b097f20/attachment.asc>

More information about the Corpora mailing list