[Corpora-List] Converting Stanford typed dependencies to universal dependencies for English

Amir Zeldes Amir.Zeldes at georgetown.edu
Thu Sep 3 15:45:15 CEST 2015


Hi Richard,

Thanks, that's exactly the sort of thing I'm looking for! It's trying to convert PTB brackets to dependencies first, but that can be skipped so it's no problem (my data is native dependencies).

I can see where it's converting all of the POS tags, changing the labels, and also doing the conditional changes such as the different types of 'to'. What I'm not seeing yet is the re-wiring of dependency edges for things like 'case' or 'name', but maybe I'm just missing it. If you or anyone else knows more about that, I'd appreciate a message off-list.

Thanks again, Amir

-----Original Message----- From: Richard Eckart de Castilho [mailto:eckart at ukp.informatik.tu-darmstadt.de] Sent: Tuesday, September 01, 2015 02:59 To: Amir Zeldes Cc: corpora at uib.no Subject: Re: [Corpora-List] Converting Stanford typed dependencies to universal dependencies for English

Hi Amir,

since the English data used in the Universal Dependency Treebank cannot be freely distributed, they include code to automatically tag/convert it. If I remember correctly, it uses an old version of the Stanford Parser and applies a transformation to the universal categories.

https://github.com/ryanmcd/uni-dep-tb

You find the code in the "std/en" folder of universal_treebanks_v2.0.tar.gz.

I didn't try it, but it might be what you are looking for.

Cheers,

-- Richard

On 31.08.2015, at 21:37, Amir Zeldes <Amir.Zeldes at georgetown.edu> wrote:


> Hi everyone,
>
> I'm wondering if anybody has or knows of a script for converting Stanford
Typed Dependencies to the Universal Dependencies scheme for English. I realize it's non-trivial because of the different handling of propositions, case, names etc. but I think with some heuristics a good baseline solution might be possible. Has anyone worked on a tool to do this automatically?
>
> Thanks,
> Amir

-- ------------------------------------------------------------------- Dr. Richard Eckart de Castilho Technical Lead Ubiquitous Knowledge Processing (UKP) Lab FB 20 / Computer Science Department Technische Universitšt Darmstadt Hochschulstr. 10, D-64289 Darmstadt, Germany phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117 eckart at ukp.informatik.tu-darmstadt.de www.ukp.tu-darmstadt.de

Web Research at TU Darmstadt (WeRC): www.werc.tu-darmstadt.de GRK 1994: Adaptive Preparation of Information from Heterogeneous Sources (AIPHES): www.aiphes.tu-darmstadt.de PhD program: Knowledge Discovery in Scientific Literature (KDSL) www.kdsl.tu-darmstadt.de -------------------------------------------------------------------



More information about the Corpora mailing list