>> I did this quite some time ago, but I never thought of this as an achievement, since it's trivial to produce.
And I wasn't saying that it would be difficult to produce. I could generate the full 2-grams or 3-grams list from the BYU-BNC databases in about one minute. I just wanted to know whether it had already been done, and whether people would find the data useful. Based on the lack of responses, it looks like it wouldn't be all that useful.
> In case you need them, http://corpus.leeds.ac.uk/frqc/bnc-bi.gz (it's based on lemmas, but I didn't use POS tags).
16 Paz . 16 pay Yeah 16 pay we 16 pay twelve 16 , payroll
This is nice, but I think that it really does need lemmas, word form, and PoS for each bigram. A PoS search like "being VVD" or "NN* NN*" would be impossible with this bigrams list (or even "being *" (being considered, being asked), since it's only lemmas).
Anyway, it looks like the question is answered -- thanks.
Mark D.
============================================ Mark Davies Professor of (Corpus) Linguistics Brigham Young University (phone) 801-422-9168 / (fax) 801-422-0906
http://davies-linguistics.byu.edu
** Corpus design and use // Linguistic databases ** ** Historical linguistics // Language variation ** ** English, Spanish, and Portuguese ** ============================================