[Corpora-List] Penn Treebank annotated with chunks

Aleksandar Savkov cytehuop at gmail.com
Tue Aug 14 10:33:45 CEST 2012


Thanks, I think Steven's remark about the chunks in the original version is what I was looking for. I'll just have to find that version of the treebank.

Best, Alex

On 14 August 2012 00:38, Alexander Yeh <asy at mitre.org> wrote:


> http://www.clips.ua.ac.be/**pages/mbsp-tags<http://www.clips.ua.ac.be/pages/mbsp-tags>
> - Describes a set of chunk tags and possibly some chunk finding
> programs
>
> http://www.cnts.ua.ac.be/**conll2000/chunking/<http://www.cnts.ua.ac.be/conll2000/chunking/>
> - Describes a past CoNLL evaluation on noun and verb chunking.
> It has some links to data sets based on WSJ as well as a script for
> generating the data sets from WSJ.
>
> Thanks
> -Alex Yeh
>
>
>
>
> Steven Bird wrote:
>
>> Aleksandar,
>>
>> The "tagged" section of Penn Treebank has chunks marked with brackets,
>> e.g.:
>>
>> [ Pierre/NNP Vinken/NNP ]
>> ,/,
>> [ 61/CD years/NNS ]
>> old/JJ ,/, will/MD join/VB
>> [ the/DT board/NN ]
>> as/IN
>> [ a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD ]
>> ./.
>>
>> The NLTK corpus readers give access to some chunked corpora:
>> http://nltk.googlecode.com/**svn/trunk/doc/howto/corpus.**
>> html#chunked-corpora<http://nltk.googlecode.com/svn/trunk/doc/howto/corpus.html#chunked-corpora>
>>
>> NLTK doesn't give an interface to the chunked version of the treebank
>> data, but it could be added if there was interest in this.
>>
>> -Steven Bird
>>
>> On 13 August 2012 22:52, Aleksandar Savkov <cytehuop at gmail.com> wrote:
>>
>>> Hello everybody,
>>>
>>> I'm looking for a chunk-annotated version of the Penn Treebank. It seems
>>> to
>>> be the most popular resource for training and testing chunking software,
>>> but
>>> I haven't been able to find a chunked version or an algorithm for
>>> extracting
>>> chunks in a deterministic way. Is there a standard resource that
>>> everybody
>>> uses or does everybody just extract the chunks from the parsed data
>>> themselves?
>>>
>>> Best,
>>> Aleksandar Savkov
>>>
>>> ______________________________**_________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/**corpora<http://mailman.uib.no/options/corpora>
>>> Corpora mailing list
>>> Corpora at uib.no
>>> http://mailman.uib.no/**listinfo/corpora<http://mailman.uib.no/listinfo/corpora>
>>>
>>>
>> ______________________________**_________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/**corpora<http://mailman.uib.no/options/corpora>
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/**listinfo/corpora<http://mailman.uib.no/listinfo/corpora>
>>
>>
>
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3637 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120814/088827db/attachment.txt>



More information about the Corpora mailing list