[Corpora-List] NLP-Cube - easy text segmentation, lemmatization, parsing and POS tagging in 50+ languages for Python

Tiberiu Boros boros at adobe.com
Mon Sep 24 16:11:23 CEST 2018

Hi everyone,

We are happy to announce we have released the first stable version of NLP-Cube.

NLP-Cube is a Python package that provides state-of-the-art text segmentation (tokenization and sentence-splitting), lemmatization, POS tagging and dependency parsing for over 50 languages.

The project's repository is https://github.com/adobe/NLP-Cube Instalation is simple: 'pip install nlpcube', and usage is as simple as 'sentences = cube("your text here")'. Here's a 1-minute usage example: https://github.com/adobe/NLP-Cube/blob/master/examples/simple_example.ipynb

We have released models for: Afrikaans, Ancient-Greek, Arabic, Armenian, Basque, Bulgarian, Buryat, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German , Gothic, Greek , Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Kazakh, Korean, Kurmanji, Latin, Latvian, North_Sami, Norwegian-Bokmaal, Norwegian-Nynorsk, Old_Church_Slavonic, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Turkish, Ukrainian, Urdu, Uyghur and Vietnamese. All these are based on the Universal Dependencies Treebanks.

We'll add more help, more examples and advanced usage in the days to come. A NER (named entity recognizer) is also available and we'll release pre-trained models soon. Finally, we're very happy to hear your thoughts/requests/feedback.

Thanks, Tibi -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3787 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20180924/313a3f73/attachment.txt>

More information about the Corpora mailing list