I had a case pretty similar to Emad's, I think whenever you do sequence classification (NER, POS-tag, vowels restoration, segmentation) leaving one out just doesn't work.
On Mon, Apr 26, 2010 at 6:40 AM, Emad Mohamed <emohamed at umail.iu.edu> wrote:
>> Message: 6
>> Date: Mon, 26 Apr 2010 09:20:20 +0100
>> From: "Georgios Paltoglou" <gpalto at gmail.com>
>> Subject: [Corpora-List] Leave-one out vs. 10-fold cross validation
>> To: <Corpora at uib.no>
>> Hello to everyone,
>> I just wanted to ask whether anyone is aware of any formal reasons (e.g.
>> error distribution, decreased validity of results) for opting for 10-fold
>> cross validation instead of leave-one out, apart from the obvious reason
>> that it is more efficient and less time-consuming.
>> My 2 cents thought is that leave-one seems more realistic in the sense
>> if the overall aim of a system is to provide the best classification for
>> examples in an "application environment" given some training data, one
>> naturally train it on the largest possible training subset.
>> Thank you for your responses.
>> Best regards,
> HI George,
> I will not answer your question, but will provide a case where leave one out
> did not work for me. When I worked with Arabic vowel restoration, the
> classification had to be done at the letter level, but the evaluation had to
> be reported at the word level. Also, there were several settings in which
> the per letter accuracy did not correlate with the per word accuracy. In
> some case, you just don't have the choice.
> Emad Soliman Ali Mohamed
> aka Emad Nawfal
> Doctoral Candidate, Department of Linguistics,
> Indiana University, Bloomington
> Corpora mailing list
> Corpora at uib.no