[Corpora-List] agent and patient probabilities
ulrike at CoLi.Uni-SB.DE
Wed Jan 24 15:45:01 CET 2007
> For some experiments, we need agent-verb-patient triples where the
> "goodness" of the agents and patients to the verb vary in strength.
> Typical ways to develop materials for such studies is by having human
> subjects rate how "good" various items are as agents and patients
> for particular verbs (e.g., "how likely is a dog to walk?", "how
> likely is a dog to be walked?"). While this works well, it's of
> course very labor (and subject) intensive. So I'm hoping to automate
Philip Resnik's work is definitely an excellent place to look.
Beyond that, my work on modelling human language processing might also
be of interest to you. One large part of my PhD work (the thesis was
submitted recently) was to build a model that predicts human judgements
about the plausibility of verb-argument-relation triples.
Key differences to Resnik's work are a generative formulation (i.e.,
plausible roles and arguments can be straightforwardly generated given a
verb) and the use of thematic roles to define the relation between verb
and argument. We tested the model against literature norming data (e.g.,
McRae et al. 1998, Trueswell et al. 1994) and against norms we elicited
ourselves for verb-argument-role triples extracted from corpora.
Details can be found in
U. Pado, M. Crocker and F. Keller, Modelling Semantic Role Plausibility
in Human Sentence Processing. EACL, Trento, 2006.
U. Pado, F. Keller and M. Crocker, Combining Syntax and Thematic Fit in
a Probabilistic Model of Sentence Processing. CogSci, Vancouver, 2006.
In the thesis, I also do a comparison to Philip Resnik's and two other
selectional preference models on the sets of norming data I mentioned. I
replicate Resnik's original successful evaluation, but our model tends
to do even a bit better at predicting plausibility judgements across the
different data sets.
If you'd like more information or have any questions, please let me know :)
> I know about the Penn Treebank; are there better and/or less
> expensive options for US English, or is this just the way to go?
It might be worthwhile to use a role-annotated corpus to make sure you
really catch the verb-argument relations you're after.
The PropBank (role annotations to parts of the Penn Treebank) is the
largest role-annotated corpus available, and it's American English, but
you may want to have a look at the FrameNet corpus as well. It's a
subset of the British National Corpus, and therefore much more balanced
in vocabulary. For example, I find that its vocabulary is closer to
"typical" psycholinguistic items than that of PropBank with its bias
towards financial language.
The FrameNet home page is at http://framenet.icsi.berkeley.edu/, and if
I understand correctly, the corpus is free for research purposes.
More information about the Corpora-archive