[Corpora-List] SAFAR framework third release

Karim BOUZOUBAA bouzoubaa at emi.ac.ma
Mon Jul 19 18:27:46 CEST 2021


ALELM Team is pleased to announce that SAFAR framework has been released in its third version with new ANLP (Arabic Natural Language Processing) resources, tools and applications. We quickly recall that SAFAR is a framework dedicated to ANLP. It is cross-platform, modular, and provides an integrated development environment (IDE). It includes : » Resources needed for different ANLP processes » Basic level modules of language, namely morphology, syntax and semantics » Applications for the ANLP

We are also pleased to announce that currently SAFAR framework is now open to your contributions. In addition, we have developed specific resources for Moroccan Arabic dialect. Consequently, the structure of this third version of SAFAR has been changed to accomodate these addings starting at the top with either MSA or MA.

The following is the list of all what actually SAFAR V3 provides, containing both old modules and new ones (with the symbol NEW at the end of the line):

Modern standard arabic

Applications :

Key Words Extractor [NEW]

Light Summarizer

Moajam Moaassir (MSA lexicon Desktop browser)

Moajam Tafaoli (Al wassit lexicon Desktop browser)

Morpho-Syntactic Processor

Stem Counter

Stopwords Analyzer [NEW]

Syntactic parsers :

FARASA Pos Tagger [NEW]

SAFAR Light Pos Tagger [NEW]

Stanford Parser

FARASA Parser [NEW]

Morphological analyzers :

Alkhalil Morphological Analyzer

Alkhalil 2 Morphological Analyzer [NEW]

BAMA Morphological Analayzer

MADAMIRA Morphological Analayzer

Stemmers :

ISRI Stemmer

Khoja Stemmer

Light10 Stemmer

Motaz Stemmer

SAFAR Stemmer [NEW]

Tashaphyne Stemmer

Lemmatizers :

Alkhalil Lemmatizer [NEW]

FARASA Lemmatizer [NEW]

SAFAR Lemmatizer [NEW]

Utils :

Benchmark for Morphological Analyzers

Benchmark for Stemmers [NEW]

Benchmark for Syntactic Parsers [NEW]

Normalization

Pattern Detection [NEW]

Sentence splitter

Stop Words [NEW]

Tokenization

Transliteration

Resources:

Alphabet

Clitics

Particles lexicon

Al wassit dictionary

CALEM (stems/lemmas) lexicon [NEW]

Contemporary dictionary Machine Learning

SAFAR Hidden Markov Model [NEW]

SAFAR Levenshtein distance [NEW]

Weka Lib

FT Lib Moroccan Arabic

Resources:

Maded lexicon [NEW]

Moralex lexicon [NEW]

The following module is removed:

Sentence Processor

Ontology (AWN and extended AWN)

---------

Useful links:

Project main page: <https://lnkd.in/gh-UQXQ> http://arabic.emi.ac.ma/safar/ Download page: <https://lnkd.in/gqrMm4t> http://arabic.emi.ac.ma/safar/?page_id=12 Examples of use: <https://lnkd.in/gCTkYxk> http://arabic.emi.ac.ma/safar/?page_id=14 Online demonstration: http://arabic.emi.ac.ma:8080/SafarWeb/ publications: <https://lnkd.in/gbhqBM5> http://arabic.emi.ac.ma/safar/?page_id=24

-----------------------------------------------------------------------------------------------

Karim Bouzoubaa, M.Sc, Ph.D د. كريم بوزوبع

Full professor أستاذ جامعي

Department of Computer Science قسم علوم الحاسوب

EMI (Ecole Mohammadia d'Ingénieurs,

Mohammadia School of Engineers) المدرسة المحمدية للمهندسين

Mohammed V University in Rabat جامعة محمد الخامس

Avenue Ibnsina B.P. 765 Agdal شارع ابن سينا ص ب 765 أكدال

Rabat, Morocco الرباط المغرب

Tel: +212 (0) 537 68.71.50 / +212 (0) 537 77.65.66 الهاتف

Fax: +212 (0) 537 77.88.53 الفاكس

karim.bouzoubaa [at] emi.ac.ma

karimbouzoubaa [at] yahoo.com

http://www.emi.ac.ma/bouzoubaa

http://www.emi.ac.ma/alelm

https://www.youtube.com/channel/UCFpBdMiXvofNsSIAxgyaxeA

** Please, consider the environment before printing this email من فضلكم فكروا في البيئة قبل طباعة هذه الرسالة - **

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 10462 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20210719/0edc7092/attachment.txt>



More information about the Corpora mailing list