[Corpora-List] Open position at the EC's Joint Research Centre: writing IE grammars

Ralf Steinberger ralf.steinberger
Tue Apr 7 14:44:02 CEST 2009

The European Commission's Joint Research Centre ( <http://ec.europa.eu/dgs/jrc/index.cfm> JRC) in Ispra, at the Lago Maggiore in Northern Italy, has another opening for a three-year position in multilingual text analysis (see below). Applicants will either need to have completed a Ph.D. or have five years of relevant post-graduate experience.

The JRC is running several public news aggregation and analysis web portals (see <http://emm.jrc.it/overview.html> http://emm.jrc.it/overview.html) and provides a number of services to a wide range of international customers. A strong focus in the JRC's work is on multilinguality and on tools to provide cross-lingual information access.

Applications (3-page <http://ipsc.jrc.ec.europa.eu/job/appl_form_grantholders.xls> application form, an updated CV in English and a copy of your passport/ID card) should be submitted by e-mail to the following e-mail address: <mailto:JRC-IPSC-GRANTHOLDERS at ec.europa.eu> JRC-IPSC-GRANTHOLDERS at ec.europa.eu by 30 April 2009 midnight CET.

According to the Vademecum for grant holders (see <http://ipsc.jrc.ec.europa.eu/showdoc.php?doc=job/VademecumforGholders2008.p df> http://ipsc.jrc.ec.europa.eu/showdoc.php?doc=job/VademecumforGholders2008.pd f), the remuneration is about 54,000 Euro/year plus allowances.

---------------------------------------------------------------------------- --

Multilingual text analysis - writing grammars for information extraction


Category: Category 30 (Requires Ph.D. or five years of relevant post-graduate experience)

Duration: 36 months

Action: OPTIMA

Remuneration and conditions: see <http://ipsc.jrc.ec.europa.eu/showdoc.php?doc=job/VademecumforGholders2008.p df> Vademecum for grant holders

URL generic call: http://ipsc.jrc.ec.europa.eu/jobs.php?id=8 <http://ipsc.jrc.ec.europa.eu/jobs.php?id=8%0b> URL specific post: http://ipsc.jrc.ec.europa.eu/showgrant.php?id=124






30. Post-Doc researcher



<http://ipsc.jrc.ec.europa.eu/showaction.php?id=18> OPTIMA

Application must be delivered before 30 Apr, 2009 - 23:59:59CET

The Internet is the richest reservoir of human knowledge that has ever existed. Advanced software tools are needed to monitor and process the vast amount of material available on-line. The Action OPTIMA (OPensource Text Information Mining and Analysis) develops innovative solutions for retrieving and extracting information from the Internet, and especially from online news and blogs. It serves many Commission Services, EU agencies and some EU Member State authorities. The core of this action is the Europe Media Monitor (EMM).

Examples of current work are automatic sentiment analysis, multilingual multi-document summarisation, event extraction, automatic entity recognition and name variant mapping, as well as various cross-language applications. Rule-based, as well as Machine Learning and hybrid methods are being used to achieve these goals.

These techniques are already to some extent being deployed in several operational applications (see http://press.jrc.it/overview.html) and part of the work would be in support of these applications. The on-going research has a strong focus on applicability in a multilingual environment. The work is highly practical and goal-oriented. Research results are expected to be used operationally. The candidate is expected to contribute to scientific publications of the research results.

The person we are looking for will be working on research activities in the field of automatic multilingual text analysis. We are specifically looking for somebody with experience in writing robust grammars for information extraction, to complement machine learning work in this area. A large part of the candidate's work will be to help to write information extraction patterns, either from scratch or by rewriting and generalising automatically learned patterns.

The system within which the results will be deployed is implemented in Java as a set of servlets in Tomcat. Good programming skills, preferably in Java are therefore recommended.

University degree in computational linguistics, computer science or related areas.

Doctoral degree in a similar discipline, or equivalent work experience of 5 years. The working language of the action is English and strong English language skills are therefore required. Given the multilingual aspect of the work, active knowledge of at least one other language and an understanding of at least one more is also required.

Good knowledge of Arabic, Farsi or Chinese would be seen as an asset.

Duration : 36 months

Ralf Steinberger (Firstname.Lastname at jrc.it) European Commission - Joint Research Centre (JRC) IPSC - SeS - OPTIMA (OPensource Text Information Mining and Analysis) URL: Applications: http://press.jrc.it/overview.html URL: The science behind them: <http://langtech.jrc.it/> http://langtech.jrc.it.

The JRC's Language Technology activity specialises in the development of highly multilingual text analysis tools and in cross-lingual applications. Many applications are accessible online, e.g.:

. <http://press.jrc.it/NewsExplorer/> NewsExplorer: multilingual news aggregation and analysis (19 languages); allows to navigate the news over time and across languages; trend analysis; collects information about people from the news; social network detection.

. <http://press.jrc.it/> NewsBrief: breaking news detection and display of the very latest thematic news from around the world; email alerting (40+ languages).

. <http://medusa.jrc.it/> MedISys Medical Information System: latest health-related news from around the world according to themes and diseases (40+ languages).

. EMM-Labs <http://emm-labs.jrc.it/> : Latest developments; social networks; live people-in-the-news; country and theme fact sheets; maps showing violent events world-wide.

JRC-Acquis Multilingual Parallel Corpus (Version 3)

. Freely available for research purposes.

. 22 languages: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish.

. Altogether over 1 Billion words.

. Sentence alignment for 231 language pairs, using the two alternative aligners Vanilla and HunAlign.

. For more information and download, see <http://langtech.jrc.it/JRC-Acquis.html> http://langtech.jrc.it/JRC-Acquis.html.

DGT-Translation Memory

. Freely available for research purposes.

. Aligned translation units for 231 language pairs.

. Alignment manually verified.

. For more information and download, see http://langtech.jrc.it/DGT-TM.html.

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 27432 bytes Desc: not available Url : https://mailman.uib.no/public/corpora/attachments/20090407/21165a13/attachment.txt

More information about the Corpora mailing list