[Corpora-List] NLP labs that have active projects on Persian

Ralf Steinberger ralf.steinberger at jrc.ec.europa.eu
Thu May 24 10:35:24 CEST 2012

Dear Hamid,

At the European Commission’s Joint Research Centre (JRC), we have developed the Europe Media Monitor (EMM) family of applications (http://emm.newsbrief.eu/overview.html), which includes Farsi.

EMM collects Farsi news (together with another 50 or so languages) and displays them in EMM-NewsBrief and in EMM-MedISys (Medical Information System). If you go to ‘advanced search’, you can display all the news sources monitored. Farsi news then get classified according to the many EMM categories and they will be displayed together with those in the other languages, if found.

In EMM-NewsExplorer (http://emm.newsexplorer.eu/NewsExplorer/home/fa/latest.html), we display the biggest news cluster of any given calendar day (for 20 languages, including Farsi), together with information we manage to extract. We aim to extract entities (persons and organisation names), geo-locations and quotations. We also try to link the Farsi news to those in (a subset of) other languages and to the news published in previous days.

NewsExplorer also collects information found on entities over time and in many languages, and it displays this information on mixed-language pages (e.g. http://emm.newsexplorer.eu/NewsExplorer/entities/en/101358.html for Mahmoud Ahmadinejad).

I do not think our Farsi information extraction tools work particularly well, but we intend to put some more effort into the Farsi tools soon.

For an overview of the EMM applications, you can read:

Steinberger Ralf, Bruno Pouliquen & Erik van der Goot (2009). An introduction to the Europe Media Monitor Family of Applications <http://langtech.jrc.ec.europa.eu/Documents/09_SIGIR-WS_Steinberger+frontmatter.pdf> . In: Fredric Gey, Noriko Kando & Jussi Karlgren (eds.): Information Access in a Multilingual World - Proceedings of the SIGIR 2009 Workshop (SIGIR-CLIR'2009), pp. 1-8. Boston, USA. 23 July 2009.

Greetings, currently from LREC in Istanbul, and best wishes for your interesting effort.


Ralf Steinberger

European Commission – Joint Research Centre (JRC)

URL of the lab: http://langtech.jrc.ec.europa.eu/

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Hamid Reza Ghader Sent: 24 May 2012 10:01 To: corpora at uib.no Subject: [Corpora-List] NLP labs that have active projects on Persian

Dear scientists,

We are going to develop a list of all NLP labs around the world that have active projects on Persian language. So I decided to ask you all to give me your lab name and homepage address if you have any project related to Persian language in your lab. I appreciate if you provide a brief description of the Persian related project of yours.

Regards, Hamidreza Ghader Natural language and Text processing Laboratory School of Electrical and Computer Engineering University of Tehran Iran http://ece.ut.ac.ir/nlp/

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 8482 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120524/c59e8c7a/attachment.txt>

More information about the Corpora mailing list