[Corpora-List] State of the Art: historical POS-Tagging EN?

Yi Yang yiyangnlp at gmail.com
Thu Jun 1 18:33:55 CEST 2017


Hi Berenike,

You may be interested in this paper:

https://yiyangnlp.github.io/downloads/yang-naacl-2016.pdf

Best, Yi

On Thu, Jun 1, 2017 at 10:33 AM, Herrmann, Berenike < jb.herrmann at phil.uni-goettingen.de> wrote:


> Dear all,
>
> We are preparing a project on lexico-semantic analyses of 18th/19th
> Century __English-written__ texts from different written genres: __essays,
> literary texts, also letters and diaries__. It's (mainly) British English.
>
> I'd like to know the state of the art:
>
> - What out-of-the box taggers (Tree Tagger, Perceptron, TnT, Stanford,
> CLAWS, etc.) perform best on this type of data?
>
> - What tagger types are possibly best suited? (HMM, maximum entropy, CRF,
> etc.)
>
> - Are there any historical/genre-specific language models available?
>
> - How about tokenizers/orthographic normalization: Is either an issue for
> British English of that period?
>
> Any kind of pointer and/or assessment is welcome.
>
> Many great thanks!!!
>
> Very best,
> Berenike
>
> https://jberenike.github.io/
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2041 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170601/5c91c811/attachment.txt>



More information about the Corpora mailing list