[Corpora-List] Corpora for language identification training?

Vlado Keselj vlado at cs.dal.ca
Thu Apr 19 16:52:00 CEST 2007


Hi,

You can find several links relevant to written language identification at:
http://users.cs.dal.ca/~vlado/nlp/#nlp/tc/langid

Here is the URL list as well:

cat:nlp/tc/langid
name:Language identification tools, by Gertjan van Noord (TextCat)
URL:http://odur.let.rug.nl/~vannoord/TextCat/competitors.html

cat:nlp/tc/langid
name:On-line tool by Steve Huffman
URL:http://complingone.georgetown.edu/~langid/

cat:nlp/tc/langid
URL:http://cslu.cse.ogi.edu/HLTsurvey/ch8node9.html
name:Chapter on Automatic Language Identification
description: in <a href="http://cslu.cse.ogi.edu/HLTsurvey/HLTsurvey.html">
Survey of the State of the Art in Human Language Technology</a> by
several editors

cat:nlp/tc/langid
URL:http://www.faganfinder.com/translate/identify.php
name:A Language identification tool at Fagan finder

cat:nlp/tc/langid
URL:http://www.translation-guide.com/language_identification.htm
name:Another language identification tool

cat:nlp/tc/langid
URL:http://www.xrce.xerox.com/people/beesley/langid.html
name:Language identifier by Ken Beesley

cat:nlp/tc/langid
URL:http://dis.tpd.tno.nl/druid/lid/lid_index.html
name:DRUID, a language identification tool

cat:nlp/tc/langid
URL:http://www.w3.org/TR/2004/REC-xml-20040204/#sec-lang-tag
name:Specifying language excerpts in XML

cat:nlp/tc/langid
URL:http://www-rali.iro.umontreal.ca/ProjetSILC.en.html
name:SILC project at RALI

cat:nlp/tc/langid
URL:http://veristage.com/demo/test3.php
name:Language Identification tool
description: by Veristage; minimum 40 characters

cat:nlp/tc/langid
URL:http://www.sil.org/silewp/2000/001/SILEWP2000-001.html
name:Language identification and IT: Addressing problems of linguistic
diversity on a global scale
description: by Peter Constable and Gary Simons, SIL International;
about language tagging

cat:nlp/tc/langid
URL:http://www.usdoj.gov/crt/cor/Pubs/ISpeakCards.pdf
name:Language identification flashcard
description:by US Dept. of Commerce

cat:nlp/tc/langid
URL:http://www.research.microsoft.com/~joshuago/physicslongcomment.ps
name:Comment by J. Goodman on a Physics paper about Language Trees and
Zipping, which got a lot of press coverage in 2001

cat:nlp/tc/langid
URL:http://www.unhchr.ch/udhr/navigate/alpha.htm
name:Universal Declaration of Human Rights
description:UN, in 363 languages (17 Jun 2004)


--Vlado

On Thu, 19 Apr 2007, Adam Funk wrote:


> [19/04/07 13:35] Dean Jones wrote:

>

> > Sorry, I wasn't clear. Personally I'm interested in language ID for

> > "written" texts - specifically, email, although others on the list may

> > be interested in spoken language ID, so I wouldn't want to discourage

> > responses about that.

>

> Here's a tool you might be interested in:

>

> http://www.let.rug.nl/~vannoord/TextCat/

>

>

> along with a list of others:

>

> http://www.let.rug.nl/~vannoord/TextCat/competitors.html

>






More information about the Corpora-archive mailing list