[Corpora-List] Open position at the EC's Joint Research Centre: multilingual text analysis

Ralf Steinberger ralf.steinberger at jrc.it
Tue Feb 17 09:40:56 CET 2009


The European Commission’s Joint Research Centre (JRC) in Ispra, at the Lago Maggiore in Northern Italy, has an opening for a three-year position in multilingual text analysis (see below). Applicants will either need to have completed a Ph.D. or have five years of relevant post-graduate experience.


The JRC is running several public news aggregation and analysis web portals (see http://emm.jrc.it/overview.html) and provides a number of services to a wide range of international customers. A strong focus in the JRC’s work is on multilinguality and on tools to provide cross-lingual information access.


Applications (3-page application form, an updated CV in English and a copy of your passport/ID card) should be submitted by e-mail to the following e-mail address: JRC-IPSC-GRANTHOLDERS at ec.europa.eu by 15 March 2009 midnight CET.


According to the Vademecum for grant holders (see http://ipsc.jrc.ec.europa.eu/showdoc.php?doc=job/VademecumforGholders2008.pdf), the remuneration is about 54,000 Euro/year plus allowances.




Automatic Multilingual Text Analysis II



Category: Category 30 (Requires Ph.D. or five years of relevant post-graduate experience)

Duration: 36 months

Action: OPTIMA

Remuneration and conditions: see Vademecum for grantholders

URL generic call:  http://ipsc.jrc.ec.europa.eu/jobs.php?id=8URL specific post: ht http://ipsc.jrc.ec.europa.eu/showgrant.php?id=67



The Internet is the richest reservoir of human knowledge that has ever existed. Advanced software tools are needed to monitor and process the vast amount of material available on-line. The Action OPTIMA (OPensource Text Information Mining and Analysis) develops innovative solutions for retrieving and extracting information from the Internet and from other Open Sources. It serves many Commission Services, EU agencies and some member state authorities. The core of this action is the Europe Media Monitor (EMM).

In this action, the person will be working on research activities on automatic multilingual text analysis. Typical examples of subjects currently being studied are automatic event extraction, automatic entity recognition and cross-language clustering.

These techniques are already to some extent being deployed in several operational applications and part of the work would be in support of these applications. The on-going research has a strong focus on applicability in a multilingual environment

The work is highly practical and goal oriented. Research results are expected to be used operationally. The candidate is expected to contribute to scientific publications of the research results.

The system within which the results will be deployed is implemented in Java as a set of servlets in Tomcat. Good programming skills, preferably in Java are therefore recommended.

University degree in computer science or computational linguistics.

Doctoral degree in similar discipline, or equivalent work experience of 5 years. The working language of the action is English and strong English language skills are required. Given the multilingual aspect of the work, active knowledge of at least one other language and an understanding of at least another one is also required.

Good knowledge of Arabic, Farsi or Chinese would be seen as an asset.

Duration : 36 months




Ralf Steinberger (Firstname.Lastname at jrc.it)European Commission - Joint Research Centre (JRC)IPSC - SeS - OPTIMA (OPensource Text Information Mining and Analysis)URL: Applications: http://press.jrc.it/overview.htmlURL: The science behind them: http://langtech.jrc.it.

The JRC’s Language Technology activity specialises in the development of highly multilingual text analysis tools and in cross-lingual applications. Many applications are accessible online, e.g.:

ˇ       NewsExplorer: multilingual news aggregation and analysis (19 languages); allows to navigate the news over time and across languages; trend analysis; collects information about people from the news; social network detection.

ˇ       NewsBrief: breaking news detection and display of the very latest thematic news from around the world; email alerting (40+ languages).

ˇ       MedISys Medical Information System: latest health-related news from around the world according to themes and diseases (40+ languages).

ˇ       EMM-Labs: Latest developments; social networks; live people-in-the-news; country and theme fact sheets; maps showing violent events world-wide.


JRC-Acquis Multilingual Parallel Corpus (Version 3)

ˇ    Freely available for research purposes.

ˇ    22 languages: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish.

ˇ    Altogether over 1 Billion words.

ˇ    Sentence alignment for 231 language pairs, using the two alternative aligners Vanilla and HunAlign.

ˇ    For more information and download, see http://langtech.jrc.it/JRC-Acquis.html.


DGT-Translation Memory

ˇ       Freely available for research purposes.

ˇ       Aligned translation units for 231 language pairs.

ˇ       Alignment manually verified.

ˇ       For more information and download, see http://langtech.jrc.it/DGT-TM.html.




-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 30454 bytes Desc: not available Url : https://mailman.uib.no/public/corpora/attachments/20090217/fc3fd056/attachment.txt

More information about the Corpora mailing list