[Corpora-List] corpora with regular expression engine (syntactic pattern)

Mark Davies Mark_Davies at byu.edu
Mon Feb 25 22:49:06 CET 2013


As long as others are listing online interfaces to large corpora that do regular expressions / wildcards, I might as well mention the BYU corpora (http://corpus.byu.edu).

For example, BYU-BNC (http://corpus.byu.edu/bnc) can do "[vh*] [v?n*] [a*] [jj*] [nn*]" in less than four seconds:

http://corpus.byu.edu/bnc/?c=bnc&q=21313156

And of course the interface also allows searches by synonyms, lemma, wildcards, alternates, customized word lists, and any combinations of these, etc etc

MD

============================================ Mark Davies Professor of Linguistics / Brigham Young University http://davies-linguistics.byu.edu/

** Corpus design and use // Linguistic databases ** ** Historical linguistics // Language variation ** ** English, Spanish, and Portuguese ** ============================================

________________________________ From: corpora-bounces at uib.no [corpora-bounces at uib.no] on behalf of Gemma Boleda [gemma.boleda at upf.edu] Sent: Monday, February 25, 2013 2:24 PM To: Corpora at uib.no Subject: Re: [Corpora-List] corpora with regular expression engine (syntactic pattern)

Hi Austina,

there are also a couple of online interfaces to corpora that allow for POS queries in regular expressions, such as for example:

Serge Sharoff's "Leeds CQP" search interface (English corpora available, and also corpora for other languages): http://corpus.leeds.ac.uk/internet.html

UPF's interface to CUCWeb (Catalan corpus): http://ramsesii.upf.es/cgi-bin/cucweb/search-form.pl?lang=en_US

These two interfaces are based on the IMS Open Corpus Workbench that Marco Baroni mentioned; indeed, this tool provides a module to easily build web interfaces with its core corpus processor as a back-end.

Best, Gemma.

-- Gemma Boleda The University of Texas at Austin http://gboleda.utcompling.com

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3299 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20130225/f3debf0f/attachment.txt>



More information about the Corpora mailing list