[Corpora-List] Most common non-Romance, non-Germanic words in English

Christian Meyer meyer at ukp.informatik.tu-darmstadt.de
Wed Apr 9 12:29:22 CEST 2014


Hi Tristan,


> I'm interested in finding the most frequent words in English which do not have an origin in any Romance or
> Germanic language. Does anyone know if such a list is available anywhere?

The best data you can get is presumably from the OED. In his keynote speech at the recent eLex, John Simpson showed "the OED in two minutes", which is essentially a visualization of the time when and the region from which words entered the English language. A video of the talk is available from http://eki.ee/elex2013/videos/. AFAIK, the platform is not yet released(?), but if it is, you could try collecting the lemmas from the "other" category, which - I guess - is what you are looking for. Obviously, it will still be a lot of work to extract the lemmas - and they are not ordered by frequency. I am, however, not aware of an accessible, ready-to-use list.

Best, Christian



More information about the Corpora mailing list