[Corpora-List] Lexical bundles - and meaningful items...
csblists at telefonica.net
Fri Jul 8 11:08:00 CEST 2005
Dear John and other list members,
Ute Römer said:
"But I suppose that concordances of frequent
3-grams may still lead you to some interesting (and meaningful) 4- and
For lists of 3-word strings as well as longer ones, derived from English
corpora, you might like to look at the following, if you haven't already
Stubbs, Michael and Isabel Barth (2003) 'Using recurrent phrases as text
type discriminators: a quantitative method and some findings." Functions of
Language 10(1): 61-104.
For similar data from Spanish, derived from smaller corpora (some as small
as 125000 words, none bigger than 1 million words), see
Butler, Christopher S. (1997) "Repeated word combinations in spoken and
written text: some implications for Functional Grammar." In C. S: Butler, J.
H. Connolly, R. A. Gatward and R. M. Vismans (eds.) A Fund of Ideas: Recent
Developments in Functional Grammar. Amsterdam: Institute for Functional
Research into Language and Language Use (IFOTT).
[As this is in a rather obscure publication which may be difficult for
people to get hold of, I could send an electronic version to anyone who is
Also, Bengt Altenberg says in the following paper that most of the recurrent
sequences he isolated from the London-Lund Corpus were pretty short, with an
average of 3.15 words, and he gives a lot of examples of phraseologically
interesting 3-word sequences:
Altenberg, Bengt (1998) On the phraseology of Spoken English: the evidence
of recurrent word combinations." In A. P. Cowie (ed.) Phraseology: Theory,
Analysis, and Applications". Oxford: Clarendon Press.
University of Hanover
Königsworther Platz 1
Phone: +49 (0)511 762 2997
Fax: +49 (0)511 762 2996
E-mail: ute.roemer at anglistik.uni-hannover.de
> -----Original Message-----
> From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
> Behalf Of Jenny Eagleton
> Sent: Monday, July 04, 2005 4:46 AM
> To: corpora at uib.no
> Subject: [Corpora-List] Lexical bundles
> ON BEHALF OF PROF. JOHN FLOWERDEW
> DEPARTMENT OF ENGLISH AND COMMUNICATION
> CITY UNIVERSITY OF HONG KONG
> RE: LEXICAL BUNDLES.
> I notice that all of the studies I have read on
> this topic have
> focussed on 4 word bundles and that you they have
> all used what I
> would call large corpora i.e. many millions of
> words. The rationale
> seems to be that with 5 word bundles you do not
> get enough to analyse
> and that with three word bundles there are
> probably too many to
> I want to do a study of bundles on a specific
> corpus I have, but
> which only has 600,000 words. To be able to work
> with large numbers
> of bundles, it would therefore make sense to focus
> on 3 word bundles.
> I could do a study on 4 word bundles, but the
> sample would be smaller.
> So my question is, do people see any disadvantages
> on focusing on
> 3-word bundles and, if so, what they might be?
> Looking forward to hearing your responses.
More information about the Corpora-archive