[Corpora-List] Google "region"-based searches

Tristan Miller miller at ukp.informatik.tu-darmstadt.de
Wed Nov 28 13:56:12 CET 2012


Greetings.

On 28/11/12 01:25 PM, Trevor Jenkins wrote:
> On 28 Nov 2012, at 11:48, Roland Schäfer <roland.schaefer at fu-berlin.de> wrote:
>
>> Whatever Google use: IP-based geolocation is totally unreliable as far
>> as language varieties are concerned.
>
> Definitely. My current ISP has various nodes connecting to the Internet.
> My connections appear to be in either Bangor in north Wales or in
> Winchester in southern England but never where I'm actually located.

I don't think you can use single cases like this to make blanket statements about the "total unreliability" of geolocation. Sure, the user of any one IP can't be pinpointed with certainty to the nearest square centimetre, but neither is geolocation totally random. Were we to analyze a large enough sample of geolocations, we could probably conclude that m% of all IPs can be correctly resolved geographically to within a n-kilometre radius. For large enough areas (say, entire countries) the accuracy of geolocation may be high enough for one's purposes to make some informed estimates on the distribution of coarse-grained language varieties. For example, given a large enough random sample of English texts written by people whose IPs resolve to Ireland, could we not reasonably expect the distribution of language varieties in those texts to roughly match that of the Irish population in general, or at least that portion of it which is online?

Regards, Tristan

-- Tristan Miller, Doctoral Researcher Ubiquitous Knowledge Processing Lab (UKP-TUDA) Department of Computer Science, Technische Universität Darmstadt Tel: +49 6151 16 6166 | Web: http://www.ukp.tu-darmstadt.de/

-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: <https://mailman.uib.no/public/corpora/attachments/20121128/fc83b8c4/attachment.asc>



More information about the Corpora mailing list