[Corpora-List] Decision tree : maximise recall over precision

Emmanuel Prochasson emmanuel.prochasson
Tue Apr 21 15:38:36 CEST 2009


Dear all,

I would like to build a decision tree (or whatever supervised classifier relevant) on a set of data containing 0.1% "Yes" and 99.9% "No", using several attributes (12 for now, but I have to tune that). I use Weka, which is totally awesome.

My goal is to prune search space for another application (ie : remove say, 80% of the data that are very unlikely to be "Yes"), that's why I'm trying to use a decision tree. Of course some algorithm returns a 1 leaf node tree tagged "No", with a 99.9% precision, which is pretty accurate, but ensure I will always withdraw all of my search space rather than prune it.

My problem is : is there a way (algorithm ? software ?) to build a tree that will maximise recall (all "Yes" elements tagged "Yes" by the algorithm). I don't really care about precision (It's ok if many "No" elements are tagged "Yes" -- I can handle false positive).

In other word, is there a way to build a decision tree under the constraint of 100% recall ?

I'm not sure I made myself clear, and I'm not sure there are solutions for my problem.

Regards,

-- Emmanuel



More information about the Corpora mailing list