On Thu, 31 Jan 2019, 14:28 Koos Wilt <kooswilt at gmail.com wrote:
> Try the Python NLTK tagger and lemmatizer. They are easily integrated
> with the other NLTK stuff, which constitutes great package.
>
> Op do 31 jan. 2019 om 15:09 schreef Michael Ustaszewski <
> Michael.Ustaszewski at uibk.ac.at>:
>
>> Re: Searching an easy-to-train Lemmatizer and POS tagger for Kyrgyz
>>
>> Dear Jörg,
>>
>> regarding your question about trainable POS taggers and lemmatizers for
>> the Kyrgyz language: OpenNLP provides training interfaces for each of
>> its modules (see
>> https://opennlp.apache.org/docs/1.9.1/manual/opennlp.html).
>> Alternatively, you may consider the IXA pipes
>> (http://ixa2.si.ehu.es/ixa-pipes/), which are based on OpenNLP and which
>> provide exactly what you are looking for: easily trainable,
>> language-independent tools, hence you can train your own models for
>> tokenisation, lemmatisation, POS-tagging, NERC, and so on. Of course,
>> you need suitable training corpora. Several input formats are supported
>> by the IXA pipes training module. However, I am not aware of any
>> training corpora for the Kyrgyz language, in the Universal Dependencies
>> repository (https://universaldependencies.org/) I have seen that Kyrgyz
>> is one of the upcoming languages.
>>
>> As far as I know, the UDPipe (http://ufal.mff.cuni.cz/udpipe) can be
>> trained with your own corpora. There is also an implementation of UDPipe
>> in R, which you may also use to train your own models (see
>>
>> https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-annotation.html
>> ).
>>
>> Probably there are many more trainable NLP tools out there that might
>> meet your requirements - the above mentioned are thos that I know and
>> that I found easy to use.
>>
>> Best wishes,
>> Michael
>>
>> Am 31.01.2019 um 11:25 schrieb corpora-request at uib.no:
>> > ------------------------------
>> >
>> > Message: 2
>> > Date: Wed, 30 Jan 2019 12:34:36 +0100
>> > From: Jörg Knappen <j.knappen at mx.uni-saarland.de>
>> > Subject: [Corpora-List] Searching an easy-to-train Lemmatizer and POS
>> > tagger for Kyrgyz
>> > To: corpora at uib.no
>> >
>> >
>> > I am searching for some tools usable for Lemmatising and POS-tagging
>> > Kyrgyz. Kyrgyz is a Turkic language (agglutinative) written with the
>> > Cyrillic alphabet. I don't expect pre-trained tools to be out there
>> > (when there is one, it would be great), but I hope to find something
>> > that can be trained easily (not needing to much training data).
>> >
>> > Thanks in advance,
>> >
>> > Jörg Knappen
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> https://mailman.uib.no/listinfo/corpora
>>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 6093 bytes
Desc: not available
URL: <https://mailman.uib.no/public/corpora/attachments/20190131/7605ccbb/attachment.txt>