Fellowship position at WIPO (Geneva) in Neural Machine Translation (12-24 months, Master level)
1. Organizational context
The World Intellectual Property Organization (WIPO) is a specialized agency of the United Nations. It is dedicated to developing a balanced and accessible international Intellectual Property (IP) system. As part of its mandate, WIPO translates Patent applications and disseminates information about published patent applications using the PATENTSCOPE search engine (https://patentscope.wipo.int/). To make this information available worldwide, WIPO is looking for machine translation (MT) techniques to help the translation of patents in various languages. Our “Wipo Translate” tool is publically available at: https://patentscope.wipo.int/translate. WIPO Translate is also used in other contexts, e.g. translating conference meetings.
In WIPO Translate, we give preference to machine learning approaches; we aim to create translation models learning from the data. The final goal is always to provide the best MT in production in term of quality (quality should be competitive compared to commercial tools), efficiency (we target quick translations, less than two seconds for one sentence) and scalability (we train our translation models on world class data in 12 different languages).
WIPO is looking for a research and development fellow who could work on improving our WIPO Translate tool focusing on Neural Machine Translation (NMT). WIPO Translate is based on open source tools, relying mainly on NMT tool Marian.
Neural networks is an emerging field, and we are looking for a highly motivated person skilled in Machine learning and Natural Language Processing (NLP) in general. We focus on applied research: applying the latest research development into production.
The WIPO Global Databases Division is working in the area of automatically translating intellectual property documents, mainly patents (see http://patentscope.wipo.int/translate) but also other types of documents (WIPO Translate has been used in other international organizations). WIPO is especially looking for a candidate that could contribute to keep WIPO’s “advantage” in machine translation.
2. Duties and responsibilities
* Knowledge of neural network technologies (deep learning) to experiment with the next generation of MT,
* Ability to experiment with various techniques for setting the best parameters for specific language pairs, specific domains, and for combining various input source corpora (mixing in-domain small corpus with larger out-of-domain corpus), mixing multi languages in a single NMT model etc.
* Work on specific tools allowing a reliable automatic evaluation of NMT models (allow for quality comparison between two MT engines),
* Use feedback from human evaluation to further improve our engines (and our automatic metrics),
* Explore and develop tools to provide quality estimation metrics associated with NMT output (use human feedback to improve these metrics).
MT-integration/ Natural language processing
* Work on specific tools allowing a better integration of NMT in the user environment (batch translating texts/ documents/ HTML page),
* Design Graphical User Interface (good knowledge of Java, Jquery and/or Web/JSF 2.0 would be an advantage) to improve means for accessing the output of machine translation,
* Pre- and Post-process texts in different languages to improve translation quality (especially for Chinese, Korean, Japanese and German), e.g. replacing named entities by placeholders, normalizing casing, automatic filtering parallel texts, etc.
Develop methods to collect and clean training data
* Especially on parallel patent sentences (e.g.: filter/clean Patent titles and abstracts, align full texts using patent priority data, etc.), but also on other types of parallel documents (aligning/filtering diplomatic documents etc.),
* Work on combination of various sources for augmenting training data:
* use human post editions,
* use other corpora,
* use synthetic data,
* Define a workflow for updating NMT models using newly published documents (e.g. incremental training, explore online learning algorithms…).
Advanced university degree in Information Technology.
At least one year of experience in the field of Machine Learning (R&D projects or PhD can be considered as experience).
A specialization in machine learning techniques.
Familiar with computational linguistics.
Experience in implementing Machine Learning projects and/or making significant scientific contribution in Machine Learning.
Essential A strong knowledge of programming language is required (Java and/or Python).
A good knowledge of Unix is required .
Statistics: automatic document classification approaches (Neural Networks, Naive Bayes, SVM, Knn, EM, ANNs, etc.).
Scripting languages: bash, Python, Perl.
Unix: Ubuntu, Red Hat, configuring Unix remote servers (using command line mode).
Experience with version control systems (SVN and/or GIT) and deployment strategies (docker, cloud servers, virtual machines etc.).
Ability to write user guides, administration documentation and reports in English.
Excellent decision-making and problem-solving skills.
A proven background in research (scientific publications) will be a strong plus.
Search engines: Lucene / ElasticSearch / Solr.
Databases: nosql techniques, Mysql, Oracle, etc.
Excellent knowledge of written and spoken English or excellent knowledge of written and spoken French and good knowledge of English.
Knowledge of other official UN languages would be a plus. A working knowledge of other official languages of WIPO (German, Spanish, French, Portuguese, Russian, Arabic, Chinese, Japanese or Korean) would be an advantage.
4. Organizational competencies
1. Communicating effectively. 2. Showing team spirit. 3. Demonstrating integrity. 4. Valuing diversity. 5. Producing results. 6. Showing service orientation. 7. Seeing the big picture. 8. Seeking change and innovation. 9. Developing yourself and others.
5. Terms and conditions
a) Term of fellowship: up to 12 months, with the possibility of renewable up to an additional 12 months, for a maximum of two years.
b) Anticipated Start Date: March 15, 2022.
c) Location: WIPO Headquarters, Geneva, Switzerland.
d) Stipend: is set in accordance with the level of qualifications and experience of the Fellow (stipend starts from 5000 CHF/month).
e) Travel expenses: one economy class ticket (return) from/to the Fellow’s place of residence by the most direct and economical route to/from Geneva. If necessary, WIPO may provide assistance in obtaining an entry visa to Switzerland.
f) WIPO provides medical and accident insurance coverage during the course of the fellowship.
g) WIPO will request a “carte de légitimation” for the Fellow, which serves as a residence and work permit, from the Permanent Mission of Switzerland to the United Nations Office and to the other international organizations in Geneva. Family members of the Fellow are not eligible for a “carte de legitimation.”
h) Tax and social security: Fellows shall be solely responsible for meeting any taxation and social security obligations that may arise directly or indirectly from their contract with WIPO.
Fellows are not staff members of WIPO and the position does not lead to any employment rights and entitlements beyond the terms of the fellowship.
How do I register my interest?
Expressions of interest, formulated through a brief statement by the candidate addressing each of the requirements set out above, accompanied only by a full curriculum vitae (résumé), should be sent by email to: bruno.pouliquen at wipo.int<mailto:bruno.pouliquen at wipo.int> by February 8, 2022 (23:59 UTC)
Only those candidates who are shortlisted for a written test or an interview (via videoconference) will be contacted.
Note for PhD students: WIPO cannot provide academic support for supervising a PhD, a special agreement with a local supervisor can be obtained.
-- Bruno Pouliquen
World Intellectual Property Organization Disclaimer: This electronic message may contain privileged, confidential and copyright protected information. If you have received this e-mail by mistake, please immediately notify the sender and delete this e-mail and all its attachments. Please ensure all e-mail attachments are scanned for viruses prior to opening or using. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 35602 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20220105/99475a16/attachment.txt>