[Corpora-List] Apply Coreference Resolution in Wikipedia

Gerber Daniel dgerber at informatik.uni-leipzig.de
Fri Apr 20 12:55:44 CEST 2012

Hello, I'm currently working on a distant supervision approach for relation extraction. I'm using the english Wikipedia articles to find sentences which contain labels of resources, for example a resource's name like "Barack Obama". My problem is now that this string only occurs in the first couple of sentences of the article and is then substituted for example with pronouns or things like "The president ..." So what I want to do, is to apply coreference resolution on the complete english Wikipedia (ideally also in other languages like German) and replace those substitutions with the resource name.

Is there a corpus like this already available? If not, would I need to write this myself (using some lib) or are there applications available which are able to do this. Also, what would be a good library for this task (speed, accuracy) ? I came across Illinois Coreference Package, StanfordNLP, OpenNLP, Illinois but I can't afford to try them all. :/

I would be very happy for some suggestions!

Kind regards, Daniel

More information about the Corpora mailing list