[Corpora-List] Penn Treebank annotated with chunks

Alexander Yeh asy at mitre.org
Tue Aug 14 01:38:26 CEST 2012


http://www.clips.ua.ac.be/pages/mbsp-tags

- Describes a set of chunk tags and possibly some chunk finding

programs

http://www.cnts.ua.ac.be/conll2000/chunking/

- Describes a past CoNLL evaluation on noun and verb chunking.

It has some links to data sets based on WSJ as well as a script for

generating the data sets from WSJ.

Thanks -Alex Yeh

Steven Bird wrote:
> Aleksandar,
>
> The "tagged" section of Penn Treebank has chunks marked with brackets, e.g.:
>
> [ Pierre/NNP Vinken/NNP ]
> ,/,
> [ 61/CD years/NNS ]
> old/JJ ,/, will/MD join/VB
> [ the/DT board/NN ]
> as/IN
> [ a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD ]
> ./.
>
> The NLTK corpus readers give access to some chunked corpora:
> http://nltk.googlecode.com/svn/trunk/doc/howto/corpus.html#chunked-corpora
>
> NLTK doesn't give an interface to the chunked version of the treebank
> data, but it could be added if there was interest in this.
>
> -Steven Bird
>
> On 13 August 2012 22:52, Aleksandar Savkov <cytehuop at gmail.com> wrote:
>> Hello everybody,
>>
>> I'm looking for a chunk-annotated version of the Penn Treebank. It seems to
>> be the most popular resource for training and testing chunking software, but
>> I haven't been able to find a chunked version or an algorithm for extracting
>> chunks in a deterministic way. Is there a standard resource that everybody
>> uses or does everybody just extract the chunks from the parsed data
>> themselves?
>>
>> Best,
>> Aleksandar Savkov
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



More information about the Corpora mailing list