[Corpora-List] BNC n-grams

Mark Davies Mark_Davies at byu.edu
Tue Nov 10 14:18:26 CET 2009


Serge,


>> I did this quite some time ago, but I never thought of this as an achievement, since it's trivial to produce.

And I wasn't saying that it would be difficult to produce. I could generate the full 2-grams or 3-grams list from the BYU-BNC databases in about one minute. I just wanted to know whether it had already been done, and whether people would find the data useful. Based on the lack of responses, it looks like it wouldn't be all that useful.


> In case you need them, http://corpus.leeds.ac.uk/frqc/bnc-bi.gz (it's based on lemmas, but I didn't use POS tags).

16 Paz . 16 pay Yeah 16 pay we 16 pay twelve 16 , payroll

This is nice, but I think that it really does need lemmas, word form, and PoS for each bigram. A PoS search like "being VVD" or "NN* NN*" would be impossible with this bigrams list (or even "being *" (being considered, being asked), since it's only lemmas).

Anyway, it looks like the question is answered -- thanks.

Mark D.

============================================ Mark Davies Professor of (Corpus) Linguistics Brigham Young University (phone) 801-422-9168 / (fax) 801-422-0906

http://davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases ** ** Historical linguistics // Language variation ** ** English, Spanish, and Portuguese ** ============================================



More information about the Corpora mailing list