> more proper nouns in news paper text than in fiction
certainly true. In general, the more formal/informational a text is, the more nominal, with more nouns, adjs/determiners; the more informal/interactional, the more verbs and pronouns. Fiction and newspaper are noteworthy for past tenses and 3rd-person pronouns.
Mark Davies and Andrew Hardie have already mentioned Doug Biber's work, I'll just add what I think of as the key/original reference, his "Variation across Speech and Writing", CUP 1988.
Sketch Engine has support for all such research, you can easily find contrasting POS-tag frequencies between corpora/subcorpora under 'wordlist' functionality (for any tagged corpora/languages)
Another favourite reference of mine: Heylighen and Dewaele http://pespmc1.vub.ac.be/Papers/Formality.pdf
My own recent contribution: Getting to know your corpus<http://trac.sketchengine.co.uk/attachment/wiki/AK/Papers/Kilgarriff_TSD2012.pdf?format=raw><http://trac.sketchengine.co.uk/raw-attachment/wiki/AK/Papers/Kilgarriff_TSD2012.pdf?format=raw>
in: *Proc. Text, Speech, Dialogue (TSD 2012)*, Lecture Notes in Computer Science. Sojka, P., Horak, A., Kopecek, I., Pala, K. (eds). Springer.
On 12 December 2012 10:00, Karin Cavallin <karin.cavallin at ling.gu.se> wrote:
> Does anyone know of any study of the difference in (and an analysis of the
> reasons) part-of-speech tag distribution in different genres? A quick study
> I made yesterday showed e.g. that my working hypothesis that there are more
> proper nouns in news paper text than in fiction was correct, at least on
> the data I examined.
> Karin Cavallin
> PhD Student in Computational Linguistics
> University of Gothenburg, Sweden
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
-- ======================================== Adam Kilgarriff <http://www.kilgarriff.co.uk/> adam at lexmasterclass.com Director Lexical Computing Ltd<http://www.sketchengine.co.uk/>
Visiting Research Fellow University of Leeds<http://leeds.ac.uk>
*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>
*DANTE: a lexical database for English<http://www.webdante.com>
* ======================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4962 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121217/a57f21c6/attachment.txt>