[Corpora-List] Most common non-Romance, non-Germanic words in English

Darren Cook darren at dcook.org
Thu Apr 10 01:17:43 CEST 2014


Trying again - I keep hitting the spam filter, so I'll try splitting my response up!


> If not, I suppose I could produce one myself easily enough by taking a
> raw frequency list (such as Adam Kilgarriff's BNC lemma counts),
> querying each entry in a machine-readable dictionary which provides
> etymological information, and filtering appropriately. But that
> presupposes that such a dictionary exists. Does anyone know of a
> suitable freely available dictionary for this task?

One approach would be to gather a lists of the words of interest:

http://en.wikipedia.org/wiki/List_of_English_words_of_Arabic_origin

http://en.wikipedia.org/wiki/List_of_English_words_of_Japanese_origin

http://en.wikipedia.org/wiki/List_of_English_words_of_Chinese_origin etc.

As most English words do come from the Romance or Germanic languages, this is not an impossible task, though you may need to filter further based on your exact criteria. E.g. tempura entered English from Japanese, but entered Japanese from Portuguese. Admiral comes from a French word which comes from an Arabic word; which does that count as.

Darren



More information about the Corpora mailing list