[Corpora-List] Spanish and Latin-American Spanish parallel corpora

Mark Davies Mark_Davies at byu.edu
Mon Jul 30 19:09:48 CEST 2018

I'm not sure if you want truly "parallel corpora" from different Spanish dialects. In other words, texts produced (for example) in Spain, but then translated(??) / altered for a Mexican audience (???).

But if you want *comparable* corpora from different Spanish dialects, then you might consider:


There are about 2 *billion* words of data from 21 different Spanish-speaking corpora, and it's very easy to compare across dialects, e.g.:

gŁero (MX): https://www.corpusdelespanol.org/web-dial/?c=span&q=67843438 ordenador (ES): https://www.corpusdelespanol.org/web-dial/?c=span&q=67843441 PARA SUBJ VINF (e.g. para ella entender) (Caribbean): https://www.corpusdelespanol.org/web-dial/?c=span&q=67843432

In addition, in about 3-4 weeks I'll be releasing a *5.1+ billion* word corpus of Spanish that is similar to (English) NOW (https://corpus.byu.edu/now/), in that it will is updated every month (about 100 million words), and it has data from the same 21 countries as above. In this way, you'll be able to track changes between dialects and over time.


Mark Davies

============================================ Mark Davies Professor of Linguistics / Brigham Young University http://davies-linguistics.byu.edu/

** Corpus design and use // Linguistic databases ** ** Historical linguistics // Language variation ** ** English, Spanish, and Portuguese ** ============================================

________________________________________ From: corpora-bounces at uib.no <corpora-bounces at uib.no> on behalf of Nicola Bertoldi <bertoldi at fbk.eu> Sent: Monday, July 30, 2018 6:26 AM To: corpora at uib.no Subject: [Corpora-List] Spanish and Latin-American Spanish parallel corpora

Dear all,

I am looking for parallel corpora between English and Spanish dialects: Spanish of Spain (es-ES), Mexican Spanish (es-MX), and so on.

I would be also interested in parallel corpora between Spanish dialects (e.g. es-ES vs es-MX).

Any suggestion where to find such resources are very welcome.

best, Nicola

-- -- Le informazioni contenute nella presente comunicazione sono di natura privata e come tali sono da considerarsi riservate ed indirizzate esclusivamente ai destinatari indicati e per le finalitŗ strettamente legate al relativo contenuto. Se avete ricevuto questo messaggio per errore, vi preghiamo di eliminarlo e di inviare una comunicazione all'indirizzo e-mail del mittente.

-- The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you received this in error, please contact the sender and delete the material.

_______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora at uib.no https://mailman.uib.no/listinfo/corpora

More information about the Corpora mailing list