You might try the Corpus del Espanol (www.corpusdelespanol.org).

For preverbal doubling (a ellos les dijeron), you'd enter something like:

a [p*] [p*] [v*]

For post-verbal (decirles a ellos), try something like:

[vr*+] a [p*]

In both cases, it will find all the several thousand tokens in 3-4 seconds.

FYI, the Corpus del Espanol is 100 million words in size, including 20 million from the 1900s. For the 1900s, it is equally balanced between spoken, fiction, newspaper, and academic, which means that you can do nice cross-genre comparisons. Since it has texts from earlier centuries as well (e.g. 20 million words from the 1800s), you can look at the historical development of the construction as well. Finally, because the spoken has the entire Habla Culta corpus, you can do nice comparisons across different dialects.


Good Morning, I am starting on a project on double clitic omission in Spanish. Does anybody know of a syntactically annotated Spanish corpus of a Latin American dialect that allows double clitic and its omission? The dialects most studied in the literature are from Buenos Aires and Los Angeles, but any other will do. Suggestions where I can find these or anything related would be appreciated. If you prefer, you can write to me individually and I will post a summary back to the list afterwards.

Many thanks, Carlos

