[Corpora-List] Difference in POS tag distribution in different genres

Adam Kilgarriff adam at lexmasterclass.com
Mon Dec 17 04:24:08 CET 2012


Dear Karin


> more proper nouns in news paper text than in fiction

certainly true. In general, the more formal/informational a text is, the more nominal, with more nouns, adjs/determiners; the more informal/interactional, the more verbs and pronouns. Fiction and newspaper are noteworthy for past tenses and 3rd-person pronouns.

Mark Davies and Andrew Hardie have already mentioned Doug Biber's work, I'll just add what I think of as the key/original reference, his "Variation across Speech and Writing", CUP 1988.

Sketch Engine has support for all such research, you can easily find contrasting POS-tag frequencies between corpora/subcorpora under 'wordlist' functionality (for any tagged corpora/languages)

Another favourite reference of mine: Heylighen and Dewaele http://pespmc1.vub.ac.be/Papers/Formality.pdf

My own recent contribution: Getting to know your corpus<http://trac.sketchengine.co.uk/attachment/wiki/AK/Papers/Kilgarriff_TSD2012.pdf?format=raw><http://trac.sketchengine.co.uk/raw-attachment/wiki/AK/Papers/Kilgarriff_TSD2012.pdf?format=raw>

in: *Proc. Text, Speech, Dialogue (TSD 2012)*, Lecture Notes in Computer Science. Sojka, P., Horak, A., Kopecek, I., Pala, K. (eds). Springer.

Best,

Adam

On 12 December 2012 10:00, Karin Cavallin <karin.cavallin at ling.gu.se> wrote:


> Does anyone know of any study of the difference in (and an analysis of the
> reasons) part-of-speech tag distribution in different genres? A quick study
> I made yesterday showed e.g. that my working hypothesis that there are more
> proper nouns in news paper text than in fiction was correct, at least on
> the data I examined.
>
> Karin Cavallin
> PhD Student in Computational Linguistics
> University of Gothenburg, Sweden
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

-- ======================================== Adam Kilgarriff <http://www.kilgarriff.co.uk/> adam at lexmasterclass.com Director Lexical Computing Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow University of Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

*DANTE: a lexical database for English<http://www.webdante.com>

* ======================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4962 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121217/a57f21c6/attachment.txt>



More information about the Corpora mailing list