[Corpora-List] Decision tree : maximise recall over precision

Eric Atwell eric
Tue Apr 21 17:20:18 CEST 2009


Surely a good decision procedure is "JUST SAY NO!" - "only" 99.9% accurate! I wish PoS-taggers and other text annotation tools were as good!

It sounds like you want to find out how to set a WEKA decision-tree builder to NOT prune any branches ... this question is better put to the WEKA mailing list wekalist at list.scms.waikato.ac.nz - see https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist to join

Eric Atwell, Leeds University

PS - please let me know if you find the answer - this looks like an interesting class coursework exercise!

On Tue, 21 Apr 2009, Emmanuel Prochasson wrote:

> Dear all,
> I would like to build a decision tree (or whatever supervised classifier
> relevant) on a set of data containing 0.1% "Yes" and 99.9% "No", using
> several attributes (12 for now, but I have to tune that). I use Weka,
> which is totally awesome.
> My goal is to prune search space for another application (ie : remove
> say, 80% of the data that are very unlikely to be "Yes"), that's why I'm
> trying to use a decision tree. Of course some algorithm returns a 1 leaf
> node tree tagged "No", with a 99.9% precision, which is pretty accurate,
> but ensure I will always withdraw all of my search space rather than
> prune it.
> My problem is : is there a way (algorithm ? software ?) to build a tree
> that will maximise recall (all "Yes" elements tagged "Yes" by the
> algorithm). I don't really care about precision (It's ok if many "No"
> elements are tagged "Yes" -- I can handle false positive).
> In other word, is there a way to build a decision tree under the
> constraint of 100% recall ?
> I'm not sure I made myself clear, and I'm not sure there are solutions
> for my problem.
> Regards,

-- Eric Atwell,

Senior Lecturer, Language research group, School of Computing,

Faculty of Engineering, UNIVERSITY OF LEEDS, Leeds LS2 9JT, England

TEL: 0113-3435430 FAX: 0113-3435468 WWW/email: google Eric Atwell

More information about the Corpora mailing list