We are preparing a project on lexico-semantic analyses of 18th/19th Century __English-written__ texts from different written genres: __essays, literary texts, also letters and diaries__. It's (mainly) British English.
I'd like to know the state of the art:
- What out-of-the box taggers (Tree Tagger, Perceptron, TnT, Stanford, CLAWS, etc.) perform best on this type of data?
- What tagger types are possibly best suited? (HMM, maximum entropy, CRF, etc.)
- Are there any historical/genre-specific language models available?
- How about tokenizers/orthographic normalization: Is either an issue for British English of that period?
Any kind of pointer and/or assessment is welcome.
Many great thanks!!!
Very best, Berenike