[Corpora-List] Corpus Benevolence
adam at lexmasterclass.com
Sat Feb 10 11:58:00 CET 2007
I'd say that the questions explored here
- how do you describe a corpus?
- how do you compare corpora?
- how do you decide what is suitable for a particular task?
are the meatiest and juiciest and most important in our field
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Alexander Osherenko
Sent: 10 February 2007 10:39
To: Santos Diana; corpora at hd.uib.no
Subject: Re: [Corpora-List] Corpus Benevolence
thank you for your comments. I've already thought I'm going mad with my
>- on the contrary, if you want to look for the best corpus to test
>something that you have developed and are not sure holds water in other
>conditions, you'd better choose the most different corpus possible (from
>your initial one)
I do know that the results would be suitable if I take a different
corpus to test since there are always very many reasons to argue bad
results. ;-) I probably explain my ideas as follows: I have a small
corpus that's why I want to extend it. When do I stop to extend? When
the size of the corpus is big enough and what does it mean "big enough"?
In my case "opinion mining" does "big enough" correspond to the number
of "opinionated" expressions?
>I think your "general" measure has to be a
It is probably the same what I meant in my previous comment.
>I have also written something on the subject of validating corpus-based
>Santos, Diana & Signe Oksefjell. "Using a Parallel Corpus to Validate
>Independent Claims", Languages in contrast, Vol. 2(1), 1999, pp.117-132.
>[tell me if you want me to send it to you]
Could you please send me.
>Hope this is useful,
It was very useful.
>Linguateca, SINTEF ICT
>Pb 124 Blindern, N-0314 Oslo, Norway
>>From: owner-corpora at lists.uib.no
>>[mailto:owner-corpora at lists.uib.no] On Behalf Of Alexander Osherenko
>>Sent: 8. februar 2007 10:00
>>To: corpora at hd.uib.no
>>Subject: [Corpora-List] Corpus Benevolence
>>Are there any measures that provide general estimation of the
>>benevolence of a corpus? The problem is - there are several
>>corpora, doesn't matter domain-specific or not, and I want to
>>find a general measure or general hints for choosing one or
>>another. How can I estimate what corpus I take besides that I
>>calculate result measures whatever they are and compare them
>>for every corpus previously chosen by chance?
>>Something like size, number of sentences, genre...
More information about the Corpora-archive