[Corpora-List] Large free access and downloadable corpora

Mark Davies Mark_Davies at byu.edu
Sun Feb 2 16:17:19 CET 2020

You might take a look at:


The samples for these corpora are free, and there is more than 30 million words of data in the free samples (and more than 25 billion words of data in the datasets that can be purchased).

>> and preferably stored on a relational database

One of the three formats is relational databases-- the same databases that are used for:



Mark Davies

============================================ Mark Davies Professor of Linguistics / Brigham Young University http://davies-linguistics.byu.edu/

** Corpus design and use // Linguistic databases ** ** Historical linguistics // Language variation ** ** English, Spanish, and Portuguese ** ============================================

________________________________________ From: corpora-bounces at uib.no <corpora-bounces at uib.no> on behalf of Jayr Alencar Pereira <jap2 at cin.ufpe.br> Sent: Saturday, February 1, 2020 5:36 AM To: corpora at uib.no Subject: [Corpora-List] Large free access and downloadable corpora

Hi everybody,

I am looking for a large corpus annotated with at least POS and lemma and preferably stored on a relational database or any other structure that allows searching by tokens.

It is for my MSc project. I am extracting semantic linguistic information like predicate-argument relations. However, the corpus need not be annotated with this kind of information.

Best regards,

-- ** Pax et bonum

Jayr Alencar Pereira. Master's Degree Student Center of Informatics, Federal University of Pernambuco, Recife - Brazil Homepage: www.jayralencar.com.br<http://www.jayralencar.com.br> GitHub: @jayralencar<https://github.com/jayralencar> CV Lattes<http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K8561724U9>

More information about the Corpora mailing list