[Corpora-List] Corpora for language identification training?

Dean Jones dean.m.jones at gmail.com
Thu Apr 19 11:07:01 CEST 2007

Hello all,

I'd like to train a classifier to perform language identification,
and, before I go ahead and create a corpus for this purpose, I'd like
to ask whether anyone on this list knows of anything suitable. The
main reason I'm asking is that I'm particularly interested in finding
something which has been used in the comparative evaluation of
language identification systems. Languages that we'd initially like to
cover are English, French, Italian, German and Spanish. Thanks for any

Best wishes,


More information about the Corpora-archive mailing list