I would like to build a decision tree (or whatever supervised classifier relevant) on a set of data containing 0.1% "Yes" and 99.9% "No", using several attributes (12 for now, but I have to tune that). I use Weka, which is totally awesome.
My goal is to prune search space for another application (ie : remove say, 80% of the data that are very unlikely to be "Yes"), that's why I'm trying to use a decision tree. Of course some algorithm returns a 1 leaf node tree tagged "No", with a 99.9% precision, which is pretty accurate, but ensure I will always withdraw all of my search space rather than prune it.
My problem is : is there a way (algorithm ? software ?) to build a tree that will maximise recall (all "Yes" elements tagged "Yes" by the algorithm). I don't really care about precision (It's ok if many "No" elements are tagged "Yes" -- I can handle false positive).
In other word, is there a way to build a decision tree under the constraint of 100% recall ?
I'm not sure I made myself clear, and I'm not sure there are solutions for my problem.
Regards,
-- Emmanuel