[Corpora-List] Corpora for language identification training?

Adam Funk a.funk at dcs.shef.ac.uk
Thu Apr 19 15:46:00 CEST 2007


[19/04/07 13:35] Dean Jones wrote:


> Sorry, I wasn't clear. Personally I'm interested in language ID for

> "written" texts - specifically, email, although others on the list may

> be interested in spoken language ID, so I wouldn't want to discourage

> responses about that.


Here's a tool you might be interested in:

http://www.let.rug.nl/~vannoord/TextCat/


along with a list of others:

http://www.let.rug.nl/~vannoord/TextCat/competitors.html





More information about the Corpora-archive mailing list