[Corpora-List] Question about evaluation

Emad Mohamed emohamed at umail.iu.edu
Sun Dec 2 23:13:55 CET 2012


Hello Corpora members, I have a corpus of 80,000 words in which each word is assigned either the class S or the class E. Class S occurs 72,000 times while class E occurs 8,000 times only. I'm wondering what the best way to evaluate the classifier performance should be. I have randomly selected a dev set (5%) and a test set (10%). I'm mainly interested in predicting which words are class E.

I've read this page: webdocs.cs.ualberta.ca/~eisner/measures.html but I'm still a little bit confused. Do we use specificity in linguistics papers? Should I report these measures for each of the two classes or a as a general number? Does this make sense / a difference?

Thank you so much.

-- Emad Mohamed aka Emad Nawfal Université du Québec à Montréal -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 982 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121202/f0291963/attachment.txt>



More information about the Corpora mailing list