[Corpora-List] Syntactically annotated corpus of a Spanish Dialect such as Buenos Aires or Los Angeles

Mark Davies Mark_Davies at byu.edu
Sun Oct 19 15:07:06 CEST 2008

You might try the Corpus del Espanol (www.corpusdelespanol.org).

For preverbal doubling (a ellos les dijeron), you'd enter something like:

a [p*] [p*] [v*]

For post-verbal (decirles a ellos), try something like:

[vr*+] a [p*]

In both cases, it will find all the several thousand tokens in 3-4 seconds.

FYI, the Corpus del Espanol is 100 million words in size, including 20 million from the 1900s. For the 1900s, it is equally balanced between spoken, fiction, newspaper, and academic, which means that you can do nice cross-genre comparisons. Since it has texts from earlier centuries as well (e.g. 20 million words from the 1800s), you can look at the historical development of the construction as well. Finally, because the spoken has the entire Habla Culta corpus, you can do nice comparisons across different dialects.


Mark Davies

============================================ Mark Davies Professor of (Corpus) Linguistics Brigham Young University (phone) 801-422-9168 / (fax) 801-422-0906 Web: davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases ** ** Historical linguistics // Language variation ** ** English, Spanish, and Portuguese ** ============================================ ________________________________________ From: corpora-bounces at uib.no [corpora-bounces at uib.no] On Behalf Of Carlos A. Gomez Gallo [cgomez at cs.rochester.edu] Sent: Saturday, October 18, 2008 10:37 PM To: corpora at uib.no Subject: [Corpora-List] Syntactically annotated corpus of a Spanish Dialect such as Buenos Aires or Los Angeles

Good Morning, I am starting on a project on double clitic omission in Spanish. Does anybody know of a syntactically annotated Spanish corpus of a Latin American dialect that allows double clitic and its omission? The dialects most studied in the literature are from Buenos Aires and Los Angeles, but any other will do. Suggestions where I can find these or anything related would be appreciated. If you prefer, you can write to me individually and I will post a summary back to the list afterwards.

Many thanks, Carlos

-- Carlos A. Gomez Gallo Computer Science and Linguistics Ph.D. candidate Email: cgomez at cs.rochester.edu Webpage: www.cs.rochester.edu/~cgomez

Snail Mail: Department of Computer Science 734 Computer Studies Building University of Rochester Rochester, NY 14627

_______________________________________________ Corpora mailing list Corpora at uib.no http://mailman.uib.no/listinfo/corpora

More information about the Corpora mailing list