[Corpora-List] The Preposition Project (TPP) and new preposition corpora

Ken Litkowski ken at clres.com
Wed Nov 14 20:07:08 CET 2012


In my efforts to understand preposition behavior, I have assembled two new corpora: (1) 7500 sentences exemplifying each preposition sense in TPP (from Oxford, up to 20 each, for 300 preps) and (2) 48,000 sentences constituting a representative sample for 272 preps drawn from the BNC, with >=250 for 140 preps (these currently not sense-tagged). These corpora add to the one of 25,000+ created for the SemEval 2007 prep WSD task for the 34 most common preps. The BNC corpus was developed with the aid of Patrick Hanks, with an intent of extending his corpus pattern analysis for verbs to preps (particularly to develop ontological characterizations of prep complements and governors).

Since analysis of these corpora clearly involves a great deal of work, I want to make them available to the wider community in the hopes of making more rapid progress in characterizing prep behavior. I am trying to use the considerable amount of lexicographic work used in TPP, taking into account how these data might be linked to FrameNet's frame elements (e.g., the FE taxonomy) and to other substantial lexical resources (WordNet, VerbNet, and PropBank). I envision the need for appropriate ML technologies, dependency parsing, and linguistic insights. It is my hope that this work would contribute substantially to research in such NLP areas as QA, Summarization, and RTE.

More details are available at my web site on TPP <http://www.clres.com/prepositions.html>, the Online TPP <http://www.clres.com/cgi-bin/onlineTPP/find_prep.cgi>, next steps for TPP <http://www.clres.com/online-papers/NextTPPSteps.pdf>, and corpus pattern analysis for preps <http://www.clres.com/online-papers/CPAPreps.pdf>. I am working to bring this scattered material, along with the corpora, to an easily accessible repository. In the meantime, please direct your comments and inquiries to me.

Ken Litkowski

-- Ken Litkowski TEL.: 301-482-0237 CL Research EMAIL: ken at clres.com 9208 Gue Road Home Page: http://www.clres.com Damascus, MD 20872-1025 USA Blog: http://www.clres.com/blog

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2884 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121114/2d998025/attachment.txt>



More information about the Corpora mailing list