[Corpora-List] An ignorant question concerning the basics of statistical significance > REPRESENTATIVENESS (Kokil Jaidka)

#KOKIL JAIDKA# KOKI0001 at e.ntu.edu.sg
Wed Feb 4 02:45:54 CET 2015

Hi all

I was recently trying to compare two web discourses and came across this problem myself - I tried to solve it by creating a word-level multinomial distribution of the web corpus (see Blei, 2012), which kind of models the topics of each corpus, and then comparing the variance of words and topics between corpora (based off of Kilgariff, 2001). It also got me around the problem that one corpus was much shorter than the other (but in itself, a complete representation of what I wanted).

I'd like to know what the community thinks about this approach.


Kokil Jaidka

PhD, Nanyang Technological University Singapore. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3267 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150204/d1b9cad5/attachment.txt>

More information about the Corpora mailing list