The European Commission's Joint Research Centre (JRC <http://ec.europa.eu/dgs/jrc/index.cfm> ) in Ispra, at the Lago Maggiore in Northern Italy has an opening for a post-doc position in multilingual text analysis (see below). The JRC is running several public news aggregation and analysis web portals (see http://emm.jrc.it/overview.html) and provides a number of services to a wide range of international customers. A strong focus in the JRC's work is on multilinguality and on tools to provide cross-lingual information access.
Applications (3-page <http://ipsc.jrc.ec.europa.eu/job/appl_form_grantholders.xls> application form and an updated <http://ipsc.jrc.ec.europa.eu/job/EU_CV_template_EN.doc> CV in English) should be submitted by e-mail to the following e-mail address: JRC-IPSC-GRANTHOLDERS at ec.europa.eu .
According to the Vademecum for grantholders (see http://ipsc.jrc.ec.europa.eu/showdoc.php?doc=job/VademecumforGholders2008.pd f), the remuneration is about 54,000 Euro/year plus allowances.
Automatic Multilingual Text Analysis
CALL REFERENCE NO. : IPSC/G02/5
Category: Post-Doc researcher (category 30)
Duration: 36 months
Remuneration: see Vademecum for grantholders <http://ipsc.jrc.ec.europa.eu/showdoc.php?doc=job/VademecumforGholders2008.p df>
URL generic call: http://ipsc.jrc.ec.europa.eu/jobs.php?id=8 URL specific post: <http://ipsc.jrc.ec.europa.eu/showgrant.php?id=7> http://ipsc.jrc.ec.europa.eu/showgrant.php?id=7
In the Web Mining and Intelligence (EMM) activity, the person will be working on research activities on automatic multilingual text analysis. Typical examples of subjects being studied currently are automatic event extraction, automatic entity recognition and cross-language clustering.
These techniques are already being deployed in several operational applications and part of the work would be in support of these applications. The on-going research has a strong focus on applicability in a multilingual environment
A new area of research is the automatic generation of summaries from multi-document texts, in particular from news article clusters. The work is highly practical and goal oriented. Research results are expected to be used operationally. The system within which the results will be deployed is implemented in Java as a set of servlets in Tomcat.
University degree in computer science or computational linguistics. Doctoral degree in similar discipline, or equivalent work experience of 5 years. Good programming skills, preferably in Java are therefore recommended. The working language of the action is English and strong English language skills are required. Given the multilingual aspect of the work, active knowledge of at least one other language and an understanding of at least another one is also required.
Good knowledge of Arabic would be seen as an asset.
Ralf Steinberger ( <mailto:Ralf.Steinberger at jrc.it> Ralf.Steinberger at jrc.it)
European Commission - Joint Research Centre (JRC) IPSC - SeS - EMM URL: Applications: http://emm.jrc.it/overview.html URL: The science behind them: <http://langtech.jrc.it/> http://langtech.jrc.it.
The JRC's Language Technology group specialises in the development of highly multilingual text analysis tools and in cross-lingual applications. Many applications are accessible online, e.g.:
* <http://press.jrc.it/NewsExplorer/> NewsExplorer: multilingual news aggregation and analysis (19 languages); allows to navigate the news over time and across languages; trend analysis; collects information about people from the news; social network detection.
* <http://press.jrc.it/> NewsBrief: breaking news detection and display of the very latest thematic news from around the world; email alerting (22+ languages).
* <http://medusa.jrc.it/> MedISys Medical Information System: latest health-related news from around the world according to themes and diseases (22+ languages).
* EMM-Labs <http://emm-labs.jrc.it:8080/> : Latest developments; social networks; live people-in-the-news; country and theme fact sheets; maps showing violent events world-wide.
JRC-Acquis Multilingual Parallel Corpus (Version 3)
* Freely available for research purposes.
* 22 languages: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish.
* Altogether over 1 Billion words.
* Sentence alignment for 231 language pairs.
* For more information and download, see <http://langtech.jrc.it/JRC-Acquis.html> http://langtech.jrc.it/JRC-Acquis.html.
* Freely available for research purposes.
* Aligned translation units for 231 language pairs.
* Alignment manually verified.
* For more information and download, see http://langtech.jrc.it/DGT-TM.html.
-------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.uib.no/mailman/public/corpora/attachments/20080424/8ae1b422/attachment.html