[Corpora-List] Most common non-Romance, non-Germanic words in English

Tristan Miller miller at ukp.informatik.tu-darmstadt.de
Tue Apr 8 14:56:41 CEST 2014


Dear all,

I'm interested in finding the most frequent words in English which do not have an origin in any Romance or Germanic language. Does anyone know if such a list is available anywhere?

If not, I suppose I could produce one myself easily enough by taking a raw frequency list (such as Adam Kilgarriff's BNC lemma counts), querying each entry in a machine-readable dictionary which provides etymological information, and filtering appropriately. But that presupposes that such a dictionary exists. Does anyone know of a suitable freely available dictionary for this task? Since I'd need to automatically query many thousands of words, I'd want something that I can download for offline use and access through an API. I could try accessing an offline dump of Wiktionary using the JWKTL API, though I suspect Wiktionary's etymological coverage is too spotty.

Regards, Tristan

-- Tristan Miller, Research Scientist Ubiquitous Knowledge Processing Lab (UKP-TUDA) Department of Computer Science, Technische Universitšt Darmstadt Tel: +49 6151 16 6166 | Web: http://www.ukp.tu-darmstadt.de/

-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 901 bytes Desc: OpenPGP digital signature URL: <https://mailman.uib.no/public/corpora/attachments/20140408/834bf52f/attachment.asc>



More information about the Corpora mailing list