[Corpora-List] is language identification solved

Miles Osborne milesosb at gmail.com
Fri Jun 26 10:37:18 CEST 2015


"Solved" is usually meant to mean that no more progress can be made, not that we have 100% accuracy.

I would suggest that for major, well represented languages it will be hard to make progress. So yes, telling if some news story is in English probably is solved.

Dealing with short texts, especially using code switching and non standard language is far more challenging.

Miles -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 460 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150626/bcf9850d/attachment.txt>



More information about the Corpora mailing list