[Corpora-List] Decision tree : maximise recall over precision

Bonaventura Coppola coppola
Thu Apr 23 18:11:29 CEST 2009


Hi Emmanuel,

the issue is managed in Weka via Cost Sensitive Classification (see e.g. http://wekadocs.com/node/15), which allows you to provide algorithms with a Cost Matrix expressing per-category penalties for misclassified examples. I am not sure if you can instantiate a cost sensitive classifier over any learning algorithm in Weka (and decision trees in particular), but you might definitely want to check it.

Best,

Bc.

On 21/apr/09, at 16:33, Eddie Bell wrote:


> Hi Emmanuel,
>
> I recently had a similar unbalanced data-set (98% 'No') and used an
> SVM with prior weights. The prior weights force the model to account
> for the recessive category by penalizing the classification errors of
> the dominant category (i.e. making recessive class accuracy more
> important).
>
> SVMs aren't as interpretable as decision trees, if trees are required
> I believe the 'rpart' R package supports weighting. I'm not familiar
> enough with weka to guide you in that respect but weights should help
> with your problem.
>
> regards
> - eddie
>
> 2009/4/21 Emmanuel Prochasson <emmanuel.prochasson at univ-nantes.fr>:
>> Dear all,
>>
>> I would like to build a decision tree (or whatever supervised
>> classifier
>> relevant) on a set of data containing 0.1% "Yes" and 99.9% "No",
>> using
>> several attributes (12 for now, but I have to tune that). I use Weka,
>> which is totally awesome.
>>
>> My goal is to prune search space for another application (ie : remove
>> say, 80% of the data that are very unlikely to be "Yes"), that's
>> why I'm
>> trying to use a decision tree. Of course some algorithm returns a
>> 1 leaf
>> node tree tagged "No", with a 99.9% precision, which is pretty
>> accurate,
>> but ensure I will always withdraw all of my search space rather than
>> prune it.
>>
>> My problem is : is there a way (algorithm ? software ?) to build a
>> tree
>> that will maximise recall (all "Yes" elements tagged "Yes" by the
>> algorithm). I don't really care about precision (It's ok if many "No"
>> elements are tagged "Yes" -- I can handle false positive).
>>
>> In other word, is there a way to build a decision tree under the
>> constraint of 100% recall ?
>>
>> I'm not sure I made myself clear, and I'm not sure there are
>> solutions
>> for my problem.
>>
>> Regards,
>>
>> --
>> Emmanuel
>>
>>
>>
>> _______________________________________________
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>
>
>
> --
> Edward J. L. Bell
> C28, Computing Department,
> Infolab 21, Lancaster University
>
> +44(0) 15245 10348
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list