[Corpora-List] CfP: Workshop TALaRE 2015 - NLP for French and European Regional Languages

Delphine Bernhard dbernhard at unistra.fr
Mon Feb 23 10:19:59 CET 2015

Call for papers

Workshop TALaRE 2015: Natural Language Processing for French and

European Regional Languages (Traitement Automatique des Langues

Régionales de France et d'Europe)

Held in conjunction with TALN 2015 (22e conférence sur le Traitement Automatique des Langues Naturelles, Caen, France, June, 22nd-25th 2015)

Research in natural language processing for under-resourced languages is currently an active area, in a global perspective of cultural heritage preservation. Regional languages generally fall into this category, as electronic resources for these languages are rare and sometimes non-existent. Providing electronic resources for these languages (including written corpora, lexicons and dictionaries) is a major asset for supporting their dissemination, teaching, preservation or standardization. It is, among others, necessary to develop text corpora, which are the most representative of language use, by collecting works of various genres (literature, theater, poetry, storytelling, press ...) and, for some languages, by taking variation into account (dialectal, phonological or graphical variations). The second step is logically to enrich the corpora with annotations. The development of annotated corpora for regional languages raises many methodological issues. It is not always possible to directly transpose existing models for resource-rich languages, partly due to a lower level of standardisation in comparison to national languages. The corpora are also a basis for the development of dictionaries, lexicons and glossaries and are necessary for the description of the actual use of a language. On the other hand, dictionaries and lexicons are needed to support the development of the corpora and their annotations (optical character recognition, lemmatization and morpho-syntactic analysis). When these resources already exist for a language or a language variety (dictionaries, lexicons, bilingual glossaries coupling a regional and a national language), the question arises as to how information contained in these resources can be shared and possibly be enriched with additional annotations (phonetic, morphosyntactic, syntactic, ...). Finally, corpora and lexicons are necessary for the development of natural language processing tools (morpho-syntactic analysis or syntactic analyzers ...). The issue is then how to best take advantage of these resources, which are often incomplete, in the development of tools.

Beyond the technical and methodological challenges, the more pragmatic difficulties related to the lack of financial and human resources to carry out the creation of resources should not be neglected. This workshop aims to bring together researchers involved in the creation of language resources and "basic" NLP tools for French and European regional languages, in order to share their views, methodologies and techniques.

We invite submission of papers on the constitution of resources and tools for regional or minority languages of France and Europe (including languages from overseas departments and territories of France). Submissions may concern completed work or preliminary studies.

Topics of interest include, but are not limited to:

* Resources: Written and oral corpus building, including

transcriptions ; Development of lexicons, dictionaries, glossaries

* Tools : Scanning, OCR and text encoding ; Linguistic annotations

(manual and automatic for morpho-syntactic or syntactic

analysis,...) ; Corpus management and query

* Articulation between theory and practice when dealing with variation

* Road maps for resources and tools


* Paper submission deadline: April 5, 2015

* Notification of paper acceptance: May 4, 2015

* Deadline for camera-ready versions: May 22, 2015


Papers will be written in French for French-speaking authors or English for non-French-speaking authors. They should have up to 12 pages in the TALN 2015 format for long papers, or up to 6 pages for short papers. A LaTeX style file and a MS Word and OpenOffice template are available on the conference website (https://taln2015.greyc.fr/soumissionstaln/). Accepted papers will be presented during the workshop. The selection criteria will be the same as those that apply for TALN 2015 research articles.

Authors should submit the papers in PDF through the submission page at



Marianne Vergez-Couret, CLLE-ERSS, Université de Toulouse 2 Delphine Bernhard, LILPA, Université de Strasbourg

Anne-Laure Ligozat, LIMSI-CNRS/ENSIIE Jean-Michel Eloy, LESCLAP, Université de Picardie Christophe Rey, LESCLAP, Université de Picardie

* *


Vincent Berment, INALCO, Paris

Myriam Bras, CLLE-ERSS, Université de Toulouse 2

Alain Dawson, LESCLAP, Université de Picardie

Nuria Gala, LIF, Aix-Marseille Université

Nabil Hathout, CLLE-ERSS, Université de Toulouse 2

Mai Ho Dac, CLLE-ERSS, Université de Toulouse 2

Joseph Mariani, IMMI, LIMSI-CNRS

Jean-Marie Pierrel, ATILF, Université de Lorraine & CNRS

Sophie Rosset, LIMSI-CNRS

Yves Scherrer, LATL, Centre universitaire d'informatique, Université de Genève

Claudia Soria, CNR-ILC, Italie

Amalia Todirascu, LiLPa, Université de Strasbourg

Assaf Urieli, Joliciel & CLLE-ERSS, Université de Toulouse 2

Pascal Vaillant, LIMICS, Université Paris 13

Contact : Marianne Vergez-Couret (vergez at univ-tlse2.fr) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 10135 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150223/d33d7f27/attachment.txt>

More information about the Corpora mailing list