[Corpora-List] Question about evaluation

Lars Buitinck L.J.Buitinck at uva.nl
Mon Dec 3 12:33:19 CET 2012

2012/12/3 <corpora-request at uib.no>:
> Date: Sun, 2 Dec 2012 17:13:55 -0500
> From: Emad Mohamed <emohamed at umail.iu.edu>
> Subject: [Corpora-List] Question about evaluation
> To: "corpora at uib.no" <corpora at uib.no>
> Hello Corpora members,
> I have a corpus of 80,000 words in which each word is assigned either the
> class S or the class E. Class S occurs 72,000 times while class E occurs
> 8,000 times only.
> I'm wondering what the best way to evaluate the classifier performance
> should be. I have randomly selected a dev set (5%) and a test set (10%).

The most common evaluation metric for classification with skewed class distributions is F1-score:

F1 = 2 * P * R / (P + R)

where P and R are precision and recall, as defined in the webpage you linked to. Since you only have two classes, you can just call either one of them "negative" and the other "positive" for the purpose of F1 score; the score comes out the same either way.

Accuracy is another single-figure summary of classifier performance, but for this kind of problem, it's no good. You can get 90% accuracy by just predicting S all the time.

> I'm mainly interested in predicting which words are class E.

If you want a more detailed evaluation, you might compute recall and precision separately in addition to F1 score, with E as the "positive" class. However, recall and precision each measure the absence of only one type of error (false positives for precision, false negatives for recall) while F1 score takes both into account.


-- Lars Buitinck Scientific programmer, ILPS University of Amsterdam

More information about the Corpora mailing list