[Corpora-List] State-of-the-art POS tagging results

Nizar Habash habash at cs.columbia.edu
Tue Nov 18 16:37:53 CET 2008


Please also check the results from the CADIM group at Columbia on morphological disambiguation (POS tagging) for Arabic:

Roth, Ryan, Owen Rambow, Nizar Habash, Mona Diab, and Cynthia Rudin. Arabic Morphological Tagging, Diacritization, and Lemmatization Using Lexeme Models and Feature Ranking. In Proceedings of Association for Computational Linguistics (ACL), Columbus, Ohio. 2008.

Diab, Mona. Towards an optimal POS tag set for Modern Standard Arabic Processing. Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria, 2007.

Diab, Mona, Kadri Hacioglu and Daniel Jurafsky. Automated Methods for Processing Arabic Text: From Tokenization to Base Phrase Chunking. Book Chapter. In Arabic Computational Morphology: Knowledge-based and Empirical Methods. Editors Antal van den Bosch and Abdelhadi Soudi. Kluwer/Springer Publications, 2007.

Habash, Nizar and Rambow, Owen, 2007. Arabic Diacritization through Full Morphological Tagging. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2007); Companion Volume, Short Papers. [PDF]

Habash, Nizar and Owen Rambow. Arabic Tokenization, Morphological Analysis, and Part-of-Speech Tagging in One Fell Swoop. In Proceedings of the Conference of American Association for Computational Linguistics (ACL05). [PDF]

Diab, Mona, Kadri Hacioglu and Daniel Jurafsky. Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks. Proceedings of Human Language Technology-North American Association for Computational Linguistics (HLT-NAACL), 2004.


On Nov 14, 2008, at 9:39 AM, Khalil Simaan wrote:

> Hi,
> Hebrew and Arabic may count under ``morphologically complex
> languages".
> For Hebrew have a look at:
> Roy Bar-Haim, Khalil Sima'an and Yoad Winter. Part-of-Speech
> Tagging
> of Modern Hebrew Text. In Journal of Natural Language Engineering
> (J-NLE)
> <http://www.cambridge.org/journals/journal_catalogue.asp?
> mnemonic=nle>,
> 14(2):223-251, 2008.
> the work extended for Arabic:
> Saib Mansour, Khalil Sima'an and Yoad Winter. Smoothing a Lexicon-
> based
> POS tagger for Arabic and Hebrew. In proceedings of ACL 2007
> Workshop
> on Computational Approaches to Semitic Languages: Common Issues and
> Resources. Prague, Czech Republic, 2007.
> Best regards
> Khalil Sima'an
> University of Amsterdam
> Hrafn Loftsson wrote:
>> Hello all.
>> Can anyone point me to papers presenting state-of-the-art POS tagging
>> results for some morphologically complex languages?
>> In his paper "Morphological Tagging: Data vs.
>> Dictionaries" (2000), Jan
>> Hajic presents an evaluation for Czech, Estonian, Hungarian Romanian,
>> and Slovene, but I wonder if you know of more recent work.
>> --
>> Regards,
>> Hrafn Loftsson, Ph.D. - www.ru.is/faculty/hrafn
>> Assistant Professor
>> School of Computer Science - www.ru.is/cs
>> Reykjavik University - www.ru.is
>> Vinsamlega athugiğ ağ upplısingar í tölvupósti şessum og viğhengi
>> eru eingöngu ætlağar şeim sem póstinum er beint til og gætu
>> innihaldiğ upplısingar sem eru trúnağarmál. Sjá nánar: http://
>> www.ru.is/trunadur
>> Please note that this e-mail and attachments are intended for the
>> named addresses only and may contain information that is
>> confidential and privileged. Further information:
>> http://www.ru.is/trunadur
>> ---------------------------------------------------------------------
>> ---
>> _______________________________________________
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
> --
> ----
> k.simaan at uva.nl
> (old email simaan at science.uva.nl will expire soon).*
> ----
> Khalil Sima'an
> Institute for Logic, Language and Computation (ILLC)
> Universiteit van Amsterdam
> http://staff.science.uva.nl/~simaan
> Tel 0205256573
> email k.simaan at uva.nl
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 17288 bytes Desc: not available Url : https://mailman.uib.no/public/corpora/attachments/20081118/15ed4329/attachment.txt

More information about the Corpora mailing list