[Corpora-List] State of the Art: historical POS-Tagging EN?

Herrmann, Berenike jb.herrmann at phil.uni-goettingen.de
Thu Jun 1 16:33:22 CEST 2017


Dear all,

We are preparing a project on lexico-semantic analyses of 18th/19th Century __English-written__ texts from different written genres: __essays, literary texts, also letters and diaries__. It's (mainly) British English.

I'd like to know the state of the art:

- What out-of-the box taggers (Tree Tagger, Perceptron, TnT, Stanford, CLAWS, etc.) perform best on this type of data?

- What tagger types are possibly best suited? (HMM, maximum entropy, CRF, etc.)

- Are there any historical/genre-specific language models available?

- How about tokenizers/orthographic normalization: Is either an issue for British English of that period?

Any kind of pointer and/or assessment is welcome.

Many great thanks!!!

Very best, Berenike

https://jberenike.github.io/



More information about the Corpora mailing list