[Corpora-List] Comparing word lengths

Muhammad Shakir Aziz true.friend2004 at gmail.com
Tue Jan 3 13:18:22 CET 2017


Dear Corpora Members I am dealing with online conversational texts which contain a lot of short hand spellings. I have normalized these spellings (longer standard spellings like brother for bro) or (short standard spellings like so for sooooooooo). Since word length is an important variable for my analysis, I just want to make sure that there is no significant /overall difference between normalized and non-normalized texts. The question: is it OK to simply compare mean word lengths from each corpus category? Or should I put mean score from each file in two columns (normalized versus non-normalized) and apply some significance test? PS: My guess is that about 10% words (at maximum) are affected by this normalization process, but I just wanted to make sure it is negligible. Regards -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 884 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170103/382c631c/attachment.txt>


More information about the Corpora mailing list