[Corpora-List] Phonetic corpora typology

Mike Maxwell maxwell at umiacs.umd.edu
Mon Mar 8 14:39:06 CET 2010


I hate to drag this out, but...

Bryar Family wrote:
> Yuri: RE: Language vs. Dialect The question is a marvelous one. I'm
> no expert, but as any of the linguists on this list can tell you,
> the terms are politically defined,

In popular use, that's true. That is not (supposed to be) the way linguists distinguish language from dialect.


> ...and that no objective set of
> metrics involving isoglosses or other set of linguistic distinctions
> are going to be very helpful.

IMO, the problem is not that there is no metric, but rather that there are many borderline cases (as linguists recognize). Given that, there can be no metric with a non-arbitrary cutoff. One common metric is mutual intelligibility, but intelligibility is a relative thing (more or less), not to mention that in practice it is often obscured by things like familiarity of speakers of one variety with the other, greater or lesser bilingualism, schooling in whichever variety is politically/ socially/ economically dominant, etc.


> The concepts of language vs. dialect
> need to be understood as localized social and political constructs

That's a different definition (not wrong, just different).


> None are based on anything but
> socio-political declarations.

Not true, for example:


> For example, look at the
> ETHNOLOGUE.http://www.ethnologue.com/home.asp The Ethnologue and the
> accompanying SIL bibliography http://www.sil.org/ attempt to use
> objective linguistic metrics and have built an imposing academic
> citation index to buttress its decisions as to what is a language and
> what is a dialect.

And in many cases these are based on actual testing of mutual intelligibility, on the ground, using stories and questions (with known answers) recorded in one area and tested in neighboring areas.


> This and the ISO language list are widely used
> references, but they are loaded with arbitrary delineations.

Unavoidable, given the gradations present in the world, and of course recognized by linguists.

> Based in
> part on the Ethnologue, Papua New Guinea is supposed to have
> literally hundreds of languages. However, a close examination of the
> Ethnologue reveals that "Gapapaiwa" and "Ghayavi" are held to be
> separate PNG languages, yet they have a "73%" lexical similarity".

And?


> This declaration begs all sorts of questions. First of all, how is
> this "similarity" measured with such precision given these languages
> vary from village to village? Who knows!

It is documented, and there are courses and books on doing this kind of testing. So someone knows :-). Several surveys measuring similarity are cited in Bryar's posting, answering his own question.


> On the other hand, "Galeya" and "Basima" [in PNG]

> are supposed to be dialects based on a purported 80%
> lexical similarity.

A few points: yes, it is quite possible that 80% lexical similarity would allow mutual intelligibility, while 73% would break it, although one could also ask how one defines the border of "mutual intelligibility." But of course varieties of languages differ over more than their lexicons (and more than their phonology/ phonetics, which I believe is Yuri's method). There's morphology, syntax,... In this case, I doubt that the decision of dialect vs. language was decided purely on the basis of lexical similarity, although that's a quick-and-dirty method when you haven't had time to try more refined methods.


> Affiliated linguists have conducted
> various local field studies...
> ...Here is another
> conducted in Ethiopia: Gutt, Ernst-August. 1980. "Intelligibility and
> interlingual comprehension among selected Gurage speech varieties."
> http://www.ethnologue.com/show_work.asp?id=50110 ...
> Here the researchers conclude, "The Dobi dialect comprehension of
> Soddo is 76%, and Soddo speakers’ of Dobi is 90%." Thus similar
> levels of mutual comprehension make you a language in New Guinea and
> a dialect in Ethiopia!

No, you're comparing two entirely different measures: lexical similarity in New Guinea, and comprehension in the other.

One final comment:
> Why is Standard Arabic "standard" given that
> far more people speak the Egyptian variety?

An interesting question, given that MSA is no one's first language; in some sense, it's more like an Arabic Esperanto, or the modern use of Latin in Rome. --

Mike Maxwell

What good is a universe without somebody around to look at it?

--Robert Dicke, Princeton physicist



More information about the Corpora mailing list