[Corpora-List] language sort

Trond Trosterud trond.trosterud at hum.uit.no
Thu Jan 11 18:12:00 CET 2007



Maria Esteva kirjoitti 10. jan. 2007 kello 22.02:


> Dear all,

>

> I am wondering if somebody knows of a program that will recognize

> and sort large sets of files according to language.


My experience is that a file certainly may contain different
languages. For our work, we identify language down to the paragraph
level, although we would often like to be as fine-grained as sentence
level.

We use text_cat, cf.
http://www.let.rug.nl/~vannoord/TextCat/
and have very good experiences.

Trond.

----------------------------------------------------------------------
Trond Trosterud t +47 7764 4763
Institutt for språkvitskap, Det humanistiske fakultet m +47 950 70140
N-9037 Universitetet i Tromsø, Noreg f +47 7764 5216
Trond.Trosterud (a) hum.uit.no http://www.hum.uit.no/a/trond/
----------------------------------------------------------------------



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.uib.no/mailman/public/corpora-archive/attachments/20070111/be9c51f4/attachment.html


More information about the Corpora-archive mailing list