[Corpora-List] Internship (stage M2) in deep learning for natural language processing at LIPN, Univ. Paris 13

Nadi Tomeh nadi.tomeh at gmail.com
Mon Dec 16 01:10:33 CET 2019

(Apologies for cross-posting)

Title: Multitask Learning of Easy-first Hierarchical Tree LSTMs for Joint Syntactic and Semantic Arabic Dependency Parsing

Context: Collaboration between RCLN ( https://lipn.univ-paris13.fr/accueil/equipe/rcln/), LIPN, Université Paris 13, and CAMeL Lab (https://bit.ly/2M0XsAG), New York University Abu Dhabi

Host lab: LIPN, Université Paris 13, 99 Avenue Jean Baptiste Clément, 93430 Villetaneuse

Supervisors: Joseph Le Roux and Nadi Tomeh

Collaborators: Nizar Habash and Dima Taji

Start date: February 2020

Duration: 6 months

Salary: 550 euros/month

Profile and required skills:


Masters in Computer Science, Computational Linguistics, Applied

Mathematics, or Statistics


Knowledge in Natural Language Processing and Deep Learning is highly



Programming skills in Python (and libraries such as pytorch, numpy, or


How to apply: send CV and available Masters' grades to tomeh at lipn.fr and leroux at lipn.fr


In recent work on semantic parsing, Peng et al. [2017; 2018]; and Kurita and Søgaard [2019] showed that the overlap between three different theories of semantics and their corresponding representations can be exploited to improve performance on all three tasks. This is done using multitask learning in a deep neural architecture. We would like to explore ways in which this approach can be applied to Arabic, which has rich morphology and complex morpho-syntactic interactions. We will work with two different dependency representations. The first is the Columbia Arabic Treebank (CATiB) representation [Habash and Roth, 2009], which is inspired by Arabic traditional grammar and which focus on modeling syntactic and morpho-syntactic agreement and case assignment. The second is the Universal Dependency (UD) representation for Arabic [Taji et al., 2017], which has relatively more focus on semantic/thematic relations within the sentence, and which is coordinated in design with a number of other languages [Nivre et al., 2016]. The two representations complement each other and stand to benefit from multitask learning approaches.

In this context, we propose to

(i) Extend the easy-first hierarchical LSTM parser of Kiperwasser and Goldberg [2016] to multitask settings. We have shown that this approach can be useful for joint lexical segmentation and dependency parsing [Constant et al., 2016]. In that work we used as our single-task model the easy-first parser of Goldberg and Elhadad [2010] trained with dynamic oracles [Goldberg and Nivre, 2013];

(ii) Apply the model to parse Arabic sentences to both CATiB and UD representations;

(ii) Employ multitask modeling insights from Peng et al. [2017; 2018]; and Kurita and Søgaard [2019] to enhance the multitask easy-first parser.



Peng, Hao, Sam Thomson and Noah A. Smith. “Deep Multitask Learning for

Semantic Dependency Parsing.” ACL (2017).


Peng, Hao, Sam Thomson, Swabha Swayamdipta and Noah A. Smith. “Learning

Joint Semantic Parsers from Disjoint Data.” NAACL-HLT (2018).


Kurita, Shuhei and Anders Søgaard. “Multi-Task Semantic Dependency

Parsing with Policy Gradient for Learning Easy-First Strategies.” ACL



Nizar Habash and Ryan M. Roth. "CATiB: The Columbia Arabic Treebank."

Proceedings of Annual Meeting of the Association for Computational

Linguistics, 2009.


Dima Taji, Nizar Habash, and Daniel Zeman. “Universal Dependencies for

Arabic.” Proceedings of the Workshop on Arabic Natural Language Processing

(with EACL), 2017.


Yoav Goldberg and Michael Elhadad. 2010. An efficient algorithm for

easy-first non-directional dependency parsing. In Human Language

Technologies: NAACL, pages 742–750, Los Angeles, California.


Eliyahu Kiperwasser and Yoav Goldberg. 2016. Easy-first dependency

parsing with hierarchical tree LSTMs. Transactions of the Association

for Computational Linguistics, 4, 445-461.


Mathieu Constant, Joseph Le Roux, Nadi Tomeh. Deep Lexical Segmentation

and Syntactic Parsing in the Easy-First Dependency Framework. NAACL,

2016, San Diego, United States. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 20542 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20191216/a3981985/attachment.txt>

More information about the Corpora mailing list