[Corpora-List] Querying Dependency-Annotated Corpora

Siva Reddy siva at sivareddy.in
Mon Aug 6 15:56:45 CEST 2012


Hi Niels,

Sketch Engine (http://sketchengine.co.uk) now supports querying dependency trees represented in CONLL format (Malt parser output is in CONLL format). Word sketches (profiles) and thesaurus can also be extracted from the parsed data.

Paper related to handling CONLL format in Sketch Engine: http://www.lrec-conf.org/proceedings/lrec2012/pdf/585_Paper.pdf

I have uploaded a portion of Penn Treebank (dependency) with which you can play with at http://corpdev.sketchengine.co.uk/run.cgi/first_form?corpname=23399c07

Sample CQL (corpus query language) queries:

1. All examples of dependency relation OBJ: [deprel="OBJ"]<http://bit.ly/Q2Pv6F>

2. All Keywords in Context of dependency relation OBJ: 1:[] []{0,5} 2:[deprel="OBJ"] & 2.head=1.id <http://bit.ly/QEWGIR>

3. Tag patterns of OBJ relation: 1:[] []{0,5} 2:[deprel="OBJ"] & 2.head=1.id<http://bit.ly/QEWZTO>

4. Word Sketch of a word, e.g. give-v, extracted from dependency corpus: http://bit.ly/QEXptr

For more details, please contact personally.

Siva

On Mon, Jul 30, 2012 at 2:28 PM, Niels Ott <nott at sfs.uni-tuebingen.de>wrote:


> Dear Corpora People,
>
> I spent some time googling for a tool that allows to explore and query
> huge dependency-annotated corpora. With huge I 'm referring to something
> as large as sDeWaC (~44M sentences), annotated the way MaltParser would
> do it automagically. I found no such tool.
>
> How do people search for things in dependency treebanks?
>
> Thanks for your time and help.
>
> Best
>
> Niels Ott
>
>
> --
> Niels Ott (M.A.), Computational Linguist
> SFB 833 "Bedeutungskonstitution", Projekt A4, Universität Tübingen
> http://www.sfs.uni-tuebingen.de/~nott
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

-- http://sivareddy.in -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3226 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120806/fb49d55b/attachment.txt>



More information about the Corpora mailing list