[Corpora-List] Google searches as linguistic evidence
d.maynard at dcs.shef.ac.uk
Thu Dec 7 14:37:01 CET 2006
Ramesh Krishnamurthy wrote:
> I suspect many are typos. People are far less fussy about
> proof-reading website information..
Is this a fact, or just a gut reaction?
Obviously it differs depending what type of material we're talking about
- blogs are much more likely to contain typos and spelling mistakes etc.
But if we're talking factual websites, are people less fussy about
proofreading? I'm not sure that my gut feeling is the same, but no doubt
there is evidence.
If we use the whole web as a corpus, clearly there will be more mistakes
than in e.g. the BNC, that's my point about having to weed out the
rubbish if you do want to use the web as a reliable source.
More information about the Corpora-archive