[Corpora-List] Corpus character diversity index

Hugh Paterson III sil.linguist at gmail.com
Wed Oct 17 08:45:41 CEST 2018


I wonder has anyone here read, or know, or seen anything like a "character diversity index"? [Character like a Unicode character or an orthographic character]

I would like some way to express the relative diversity of characters contained in a corpus. Some corpora are small and have a lot different characters, while others are large and have relatively fewer characters. So, is there some way accepted measure/expression of the diversity relative to the size of the corpus?

thank you, all the best - Hugh Paterson III -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 657 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20181016/01ad8ddd/attachment.txt>

More information about the Corpora mailing list