[Corpora-List] corpora with regular expression engine (syntactic pattern)

Marco Baroni marco.baroni at unitn.it
Sun Feb 24 14:43:25 CET 2013


Dear Austina,

If I understand your question correctly, it pertains more to the query engine you use to search the corpus than about the corpus itself (assuming it is POS-tagged).

Given a corpus with POS tags (for example, for English and French you can find them, also, here: http://wacky.sslmit.unibo.it/doku.php?id=corpora), you can index them with the IMS Open Corpus Workbench (http://cwb.sourceforge.net/), and then you will be able to issue queries expressed as regular expressions over sequences of POS, e.g., things like:

VERB ART? ADJ* NOUN (a verb optionally followed by an article, 0 o more adjectives, and a noun)

Hth,

Marco


>
> 2013/2/24 Olivier Austina <olivier.austina at gmail.com
> <mailto:olivier.austina at gmail.com>>
>
> Hi Matías,
> English, French or Romanian but any language is welcome. Thank you.
>
> Austina
>
>
> 2013/2/24 Matías Guzmán <mortem.dei at gmail.com
> <mailto:mortem.dei at gmail.com>>
>
> At least give us the language you want.
>
> Matías Guzmán Naranjo.
>
>
> 2013/2/24 Olivier Austina <olivier.austina at gmail.com
> <mailto:olivier.austina at gmail.com>>
>
> Hi,
>
> Is there a corpora which can be queried using Part Of Speech
> tags in a regular expression?
> --
> Regards
> Austina
>
>
> _______________________________________________
> UNSUBSCRIBE from this page:
> http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no <mailto:Corpora at uib.no>
> http://mailman.uib.no/listinfo/corpora
>
>
>
>
>
> --
> Regards
> Austina
>
>

-- Marco Baroni Center for Mind/Brain Sciences (CIMeC) University of Trento http://clic.cimec.unitn.it/marco



More information about the Corpora mailing list