I'm interested in finding the most frequent words in English which do not have an origin in any Romance or Germanic language. Does anyone know if such a list is available anywhere?
If not, I suppose I could produce one myself easily enough by taking a raw frequency list (such as Adam Kilgarriff's BNC lemma counts), querying each entry in a machine-readable dictionary which provides etymological information, and filtering appropriately. But that presupposes that such a dictionary exists. Does anyone know of a suitable freely available dictionary for this task? Since I'd need to automatically query many thousands of words, I'd want something that I can download for offline use and access through an API. I could try accessing an offline dump of Wiktionary using the JWKTL API, though I suspect Wiktionary's etymological coverage is too spotty.
-- Tristan Miller, Research Scientist Ubiquitous Knowledge Processing Lab (UKP-TUDA) Department of Computer Science, Technische Universitšt Darmstadt Tel: +49 6151 16 6166 | Web: http://www.ukp.tu-darmstadt.de/
-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 901 bytes Desc: OpenPGP digital signature URL: <https://mailman.uib.no/public/corpora/attachments/20140408/834bf52f/attachment.asc>