I wonder has anyone here read, or know, or seen anything like a "character diversity index"? [Character like a Unicode character or an orthographic character]
I would like some way to express the relative diversity of characters contained in a corpus. Some corpora are small and have a lot different characters, while others are large and have relatively fewer characters. So, is there some way accepted measure/expression of the diversity relative to the size of the corpus?
thank you, all the best - Hugh Paterson III -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 657 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20181016/01ad8ddd/attachment.txt>