[Corpora-List] Variant verbal government extraction
Mihail.Kopotev at Helsinki.fi
Fri Feb 23 15:09:01 CET 2007
Thank you, Adam.
That's the way we were thinking about. But the problem seems to be more
First, let's keep in mind that we're working with a language that has
reach noun morphology. So, the right periphery of a verb can be
presented with both prepositional phrases and noun phrases (where more
than one case is possible).
Let’s give me an example, explaining this statement.
In Russian one can say:
strelyat' po utkam / utok/ v utok,
or literary in English:
to shoot at ducks / ducks / into ducks ‘to shoot at ducks’
So, even if we have got a list of verbs, we can hardly search all
possible prepositional phrases and noun phases, where the same noun
(‘duck’ in our example) can be presented in all variety of ways (the
accusative and two PPs in our example).
In other words, using the algorithm you suggested
… Find how often it occurs in pattern <VERB PRONOUN>
Find how often it occurs in pattern <VERB to PRONOUN> …
we will have to check all nouns and pronouns in all cases, as well as
all possible PPs in position of the PRONOUN.
The goal that takes a lot of time to accomplish. Can there be any other
way to put together all these verbs?
Department of Slavonic
and Baltic Languages and Literatures
University of Helsinki
Adam Kilgarriff :
> The algorithm you want is
> In a large corpus
> For each verb
> Find how often it occurs in pattern <VERB PRONOUN>
> Find how often it occurs in pattern <VERB to PRONOUN>
> Compute a statistic to see how high both these numbers are, relative
> to overall freq of verb
> Sort verbs according to the statistic
> Now you have a starter set for examining which verbs show the
> behaviour you want to investigate.
> All relevant frequencies are available for, eg, the BNC, in the Sketch
> Engine http://www.sketchengine.co.uk <http://www.sketchengine.co.uk/>
> where you can define the patterns in CQL (Corpus Query Language from
> Stuttgart Uni). We don’t currently have a nice web interface for
> robots but will have shortly, in the meantime, ask us and we can set
> things up to help you (eg by allowing you robot access and then you’d
> need to scrape web pages)
> -----Original Message-----
> *From:* owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no]
> *On Behalf Of *Mikhail Kopotev
> *Sent:* 22 February 2007 13:15
> *Cc:* CORPORA at UIB.NO
> *Subject:* [Corpora-List] Variant verbal government extraction
> Dear all,
> does anyone know how to recognize and extract variations of verbal
> government such as “to write you/to you’ from a corpus?
> As far as I am interested in Russian morphosyntactic changes, I would
> like you to point me any tools, methods rather than obtained results,
> concerning English or any other relevant ;) languages.
> Many thanks,
> Mikhail Kopotev
> Department of Slavonic
> and Baltic Languages and Literatures
> University of Helsinki
More information about the Corpora-archive