[Corpora-List] Google searches as linguistic evidence

maxwell at ldc.upenn.edu maxwell at ldc.upenn.edu
Thu Dec 7 16:24:00 CET 2006


Quoting Ramesh Krishnamurthy <r.krishnamurthy at aston.ac.uk>:

> I don't know of many websites who use professional proof-readers...


I'm sure most readers of this list have already seen this, but just in case:

Christoph Ringlstetter, Klaus U. Schulz and Stoyan Mihov: Orthographic
Errors in Web Pages - Towards Cleaner Web Corpora . Computational
Linguistics, September 2006, Vol. 32(3), pp. 295-340.

One useful output is a classification of websites into ones that have
more or fewer misspellings.

Mike Maxwell
CASL/ U MD

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.






More information about the Corpora-archive mailing list