[Corpora-List] CoNLL to Penn TreeBank conversion

Djamé Seddah djame.seddah at free.fr
Wed Dec 12 20:03:45 CET 2012


Hi, would it be an option to use directly a constituent parser (trained on the small Cast3LB treebank)? If that's the case the pcfg-la grammars and morphological models we generated last summer can be made available quite quickly. "Statistical Parsing of Spanish and Data Driven Lemmatization" (Le Roux, Sagot and Seddah, SPMRL 2012)

Note that you'll probably need to use a functional labeler if you want complete parses (eg. with function labels)

Best, Djamé ps: there were works done by M. Collins and collegues on Czech and a whole line of works actively pursued in dependency to constituent conversion by Fei Xia and colleagues. Long story short: it's really not easy. Why don't use the Anchora corpus which is free, quite large and provides both constituents and dependencies?

On 13 déc. 2012, at 00:14, Josep M. Fontana wrote:


> Does anybody know about any tool, script, etc. to convert from column-based CoNLL format to the Penn Treebank annotation style? I know there is the srlconll software package that does exactly the opposite but I haven't been able to find anything that does the conversion in the direction we need.
>
> We have access to parsers that are trained for Spanish but the output is in the CoNLL format. To be able to use tools to exploit the syntactic annotation we need it to be in the Penn TreeBank format.
>
> Any help will be greatly appreciated.
>
>
> Josep M.
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list