[Corpora-List] Syntactically annotated corpus of a, Spanish Dialect such as Buenos Aires or Los Angeles

"José M. García-Miguel" gallego at uvigo.es
Mon Oct 20 23:36:10 CEST 2008

You might also try the search form of ADESSE (http://adesse.uvigo.es/data/avanzado.php), with the following options:

Genero Textual = "Oral" Procedencia="Hispanoamérica"

And in "Parámetros del argumento 1":

Función Sintáctica="Objeto" (-->and also "cualquiera" / "directo" / "indirecto") Categoría Sintáctica="Cualquiera (no vacío)" or any specific category in the list ("FN"/ "ProPers"/ ...) Concordancia/clítico= "Clítico objeto" [for doubling, for ex: "va a buscarlo al soldadito"], or Concordancia/clitico= "Ninguno (nulo)" [for not doubling; for ex: "se queda a buscar a Pedro Páramo"]

This gives you the examples of doubling / not doubling from the Buenos Aires part of Habla Culta corpus. The form gives you several options to refine your search, but, for the moment, not all the possibilities imaginable from the original database are available (in the future, I hope they will be).

ADESSE is a semantically enlarged version of BDS, a syntactic database of 160 thousand Spanish clauses build on the Arthus corpus (1,5 million words). The oral part of this corpus comprises the Madrid, Sevilla, and Buenos Aires texts from the Norma Culta corpus. This is why in this case a search over "Género textual = oral" and "Procedencia=Hispanoamérica" is equivalent to a search over Buenos Aires.


Jose M. Garcia-Miguel University of Vigo E-mail: gallego at uvigo.es Web: webs.uvigo.es/weba575/jmgm/

> ----------------------------------------------------------------------
> Message: 1
> Date: Sun, 19 Oct 2008 07:07:06 -0600
> From: Mark Davies <Mark_Davies at byu.edu>
> Subject: Re: [Corpora-List] Syntactically annotated corpus of a
> Spanish Dialect such as Buenos Aires or Los Angeles
> To: "Carlos A. Gomez Gallo" <cgomez at cs.rochester.edu>,
> "corpora at uib.no" <corpora at uib.no>
> You might try the Corpus del Espanol (www.corpusdelespanol.org).
> For preverbal doubling (a ellos les dijeron), you'd enter something like:
> a [p*] [p*] [v*]
> For post-verbal (decirles a ellos), try something like:
> [vr*+] a [p*]
> In both cases, it will find all the several thousand tokens in 3-4 seconds.
> FYI, the Corpus del Espanol is 100 million words in size, including 20 million from the 1900s. For the 1900s, it is equally balanced between spoken, fiction, newspaper, and academic, which means that you can do nice cross-genre comparisons. Since it has texts from earlier centuries as well (e.g. 20 million words from the 1800s), you can look at the historical development of the construction as well. Finally, because the spoken has the entire Habla Culta corpus, you can do nice comparisons across different dialects.
> Best,
> Mark Davies
> ============================================
> Mark Davies
> Professor of (Corpus) Linguistics
> Brigham Young University
> (phone) 801-422-9168 / (fax) 801-422-0906
> Web: davies-linguistics.byu.edu
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ============================================
> ________________________________________
> From: corpora-bounces at uib.no [corpora-bounces at uib.no] On Behalf Of Carlos A. Gomez Gallo [cgomez at cs.rochester.edu]
> Sent: Saturday, October 18, 2008 10:37 PM
> To: corpora at uib.no
> Subject: [Corpora-List] Syntactically annotated corpus of a Spanish Dialect such as Buenos Aires or Los Angeles
> Good Morning,
> I am starting on a project on double clitic omission in Spanish. Does
> anybody know of a syntactically annotated Spanish corpus of a Latin
> American dialect that allows double clitic and its omission? The dialects
> most studied in the literature are from Buenos Aires and Los Angeles, but
> any other will do.
> Suggestions where I can find these or anything related would be
> appreciated. If you prefer, you can write to me individually and I
> will post a summary back to the list afterwards.
> Many thanks,
> Carlos
> -- Carlos A. Gomez Gallo
> Computer Science and Linguistics Ph.D. candidate
> Email: cgomez at cs.rochester.edu
> Webpage: www.cs.rochester.edu/~cgomez
> Snail Mail:
> Department of Computer Science
> 734 Computer Studies Building
> University of Rochester
> Rochester, NY 14627
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

More information about the Corpora mailing list