[Corpora-List] Corpus for transliteration of names

Grishma Jena gjena at seas.upenn.edu
Thu Apr 28 21:14:52 CEST 2016



> Thank you, Martin for your reply. Apologies for not mentioning the
> details. I've a dataset of names of persons from this
> <http://www.cis.upenn.edu/~ccb/publications/transliterating-from-all-languages.pdf>
> paper which were mined from Wikipedia. I'm considering a subset of those
> languages, and focusing more on these particular languages: pap, jbo, mhr,
> fur, ilo, rue, or, bcl and hopefully soon hi.

I have a list of 1000 candidates for each name, from which I'm building a model to predict the correct transliteration, which is where the ranker comes into play. So far, I've used features that were a part of the output given for each of the 1000 candidates (generated from Joshua). I'm now looking to see if there are any other features I could use, particularly those from Named Entity Transliteration and Discovery in Multilingual Corpora, Klementiev and Roth <http://klementiev.org/publications/learningmt08.pdf>. Hope it's more clear now as to what I intend to do.

Thank you!

-- Regards, Grishma Jena MSE Computer and Information Sciences -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1509 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160428/1a23f9b7/attachment.txt>



More information about the Corpora mailing list