[Corpora-List] Ukrainian tagging parameters

Mariana Romanyshyn mariana.scorp at gmail.com
Tue Dec 10 23:49:49 CET 2019


Hi Daniel,

You might find these tools useful:

- https://stanfordnlp.github.io/stanfordnlp/ has POS tagging,

lemmatization, and dependency parsing models for Ukrainian trained on

Universal dependencies.

- https://github.com/kmike/pymorphy2 uses

https://github.com/brown-uk/dict_uk to output all possible lemmas and

parts of speech for a word in Ukrainian, but it doesn't disambiguate

- please install using `pip install git+

https://github.com/kmike/pymorphy2.git pymorphy2-dicts-uk`

- (It doesn't work for the Ukrainian language if you install it

directly via pip.)

- In case you need a simple tokenizer, you can use

https://github.com/lang-uk/tokenize-uk.

Best regards, Mariana Romanyshyn

чт, 21 лист. 2019 о 14:05 Vladimír Benko <vladimir.benko at juls.savba.sk> пише:


> Dear Daniel,
>
> You may want to try to train the TreeTagger yourself using the Ukrainian
> Treebank available from the Universal Dependencies site. Alternatively,
> you also can tag your corpus by UDPipe with the language model trained on
> that treebank.
>
> Best,
>
> Vlado B, 12:55
>
> Dear colleagues,
>
> Does anyone know if Ukrainian parameters exist for TreeTagger (there's no
> mention of them on the website), or if there's another tagger similar to
> TreeTagger that could add POS and Lemma tags to Ukrainian?
>
> Thanks in advance for any help.
>
> Best regards,
> --
> Daniel HENKEL <https://univ-paris8.academia.edu/DanielHENKEL>
>
> *Maître de Conférences (Linguistique et Traduction) UFR5 LLCE-LEA • EA1569
> TransCrit*
> Université Paris 8 Vincennes-St-Denis
>
>
> *“non si può stendere una tipologia delle traduzioni, ma al massimo una
> tipologia di diversi modi di tradurre, volta per volta negoziando il fine
> che ci si propone – e volta per volta scoprendo che i modi di tradurre sono
> più di quelli che sospettiamo.”* U. Eco
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing listCorpora at uib.nohttps://mailman.uib.no/listinfo/corpora
>
>
> --
> Vladimír Benko
>
> Slovak Academy of Sciences
> Ľ. Štúr Institute of Linguistics
> Panská 26, SK-81101 Bratislava
>
> Tel +421-2-54431762 Fax -54431756
>
> http://aranea.juls.savba.sk/guest/
> https://www.facebook.com/araneawebcorpora/
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 5184 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20191211/f89ca81c/attachment.txt>



More information about the Corpora mailing list