[Corpora-List] Most common non-Romance, non-Germanic words in English

Tristan Miller miller at ukp.informatik.tu-darmstadt.de
Wed Apr 9 18:38:55 CEST 2014

Dear Christian,

On 09/04/14 12:29 PM, Christian Meyer wrote:
>> I'm interested in finding the most frequent words in English which
>> do not have an origin in any Romance or Germanic language. Does
>> anyone know if such a list is available anywhere?
> The best data you can get is presumably from the OED. In his keynote
> speech at the recent eLex, John Simpson showed "the OED in two
> minutes", which is essentially a visualization of the time when and
> the region from which words entered the English language. A video of
> the talk is available from http://eki.ee/elex2013/videos/. AFAIK, the
> platform is not yet released(?), but if it is, you could try
> collecting the lemmas from the "other" category, which - I guess - is
> what you are looking for. Obviously, it will still be a lot of work
> to extract the lemmas - and they are not ordered by frequency. I am,
> however, not aware of an accessible, ready-to-use list.

Thanks for this. I haven't seen the video yet, though there's something similar to what you describe on the OED's website at <http://www.oed.com/timelines>. This tool allows you to graph and list (in short portions) words by origin, though not by frequency.

Regards, Tristan

-- Tristan Miller, Research Scientist Ubiquitous Knowledge Processing Lab (UKP-TUDA) Department of Computer Science, Technische Universitšt Darmstadt Tel: +49 6151 16 6166 | Web: http://www.ukp.tu-darmstadt.de/

-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 901 bytes Desc: OpenPGP digital signature URL: <https://mailman.uib.no/public/corpora/attachments/20140409/86ed125f/attachment.asc>

More information about the Corpora mailing list