workshop co-located with COLING 2020 (12/12/2020)
Paper submission *deadline*: September 1st, 2020
*1 BACKGROUND *
Supporting us in many tasks (thinking, searching, memorising and communicating) words are important. Hence, one may wonder how to build tools supporting their learning and usage (access/navigation). Alas the answer is not quite as straightforward as it may seem. It depends on various factors: the questioner's background (lexicography, psychology, computer science), the task (production/reception), and the material support (hardware). Words in books, computers and the human brain are not the same. Obviously, being aware of this, different communities have focused on different issues —(dictionary building; creation of navigational tools; representation and organisation of words; time course for accessing a word, etc.)— yet, their views and respective goals have changed considerably over time.
Rather than considering the lexicon as a static entity, where discrete units (words) are organised alphabetically (database view), dictionaries are now viewed dynamically, i.e., as lexical graphs, whose entities are linked in various ways (topical relations; associations) and whose weight links may vary over time. While lexicographers view words as products (holistic entities), psychologists and neuroscientists view them as processes (decomposition), involving various steps or layers (representations) between an input and an output.
Computational linguists have their own ways to look at words, and their proposals have also changed quite a bit during the last decade. Discrete count-based vector representations have successively been replaced by continuous vectors (i.e., word embeddings) and then by language-model-based contextualised representations. These latter are more powerful than any of the other forms, as they are able to account for context ambiguity, outperforming the static models (including word-embeddings) in a broad range of tasks.
As one can see, different communities look at words from different angles, which can be an asset, as complementary views may help us to broaden and deepen our understanding of this fundamental cognitive resource. Yet, this diversity of perspectives can also a problem, in particular if the field is rapidly moving on, as in our case. Hence it becomes harder and harder for everyone, including experts, to remain fully informed about the latest changes (state of the art). This is one of the reasons why we organise this workshop. More precisely, our goal is not only to keep people informed without getting them crushed by the information glut, but also to help them to perceive clearly what is new, relevant, hence important. Last, but not least, we would like to connect people from different communities in the hope that this may help them to gain new insights or inspirations. * 2 SCOPE and TOPICS*
This workshop is about possible enhancements of lexical resources (representation, organisation of the data, etc.). To allow for this we invite researchers to submit their contributions. The idea is to discuss the limitations of existing resources and to explore possible enhancements that take into account the users’ and the engineers' needs (computational aspects).
Also, given the success of the shared task devoted to the corpus-based identification of semantic relations (CogALex-V., 2016), we propose anotheredition byadding this time a multilingual component. Our special focus will be on paradigmatic semantic relations, such as synonymy, antonymy and hypernymy, which are notoriously difficult to be distinguished by the classical word embedding models.
For this workshop we solicit papers including but not limited to the following topics, each of which can be considered from various points of view: linguistics (lexicography, computational- or corpus linguistics), neuro- or psycholinguistics (tip-of-the-tongue problem, word associations), network-related sciences (vector-based approaches, graph theory, small-world problem), and so on.
*Organization, i.e. structure of the lexicon * - Micro- and macrostructure of the lexicon; - Indexical categories (taxonomies, thesaurus-like topical structures, etc.); - Map of the lexicon (topology) and relations between words (word associations).
*The meaning of words and how to reveal it*
- Lexical representation (holistic, decomposed); - Meaning representation (concept based, primitives); - Distributional semantics (count models, neural embeddings, etc. )
***Analysis of the conceptual input given by a dictionary user*
- What information do language producers typically provide when looking for a word (terms, relations)? - What kind of relational information do they give: typed or untyped relations? - Which relations are typically used?
*Methods for crafting dictionaries or indexes * - Manual, automatic or collaborative building of dictionaries and indexes (crowdsourcing, serious games, etc.); - Extraction of associations from corpora to build semantic networks supporting navigation; - (Semi-) automatic induction of the link type (e.g., synonym, hypernym, meronym, ...).
***Creation of new types of dictionaries** * - Concept dictionary; - Dictionary of larger segments than words (clauses, phrasal elements); - Dictionary of patterns or concept-patterns; - Dictionary of syllables.
***Dictionary access*(navigation and search strategies), interface issues
- Search based on sound (rhymes), meaning or contextually related words (associations); - Determination of appropriate search space based on the user’s cognitive state (information available at the onset: query) and meta-knowledge (knowledge concerning the relationship between the input and the target word), ... - Identification of typical word access strategies (navigational patterns) used by people; - Interface problems, data visualisation.
*3 WORKSHOP SUBMISSIONS*
The workshop features two tracks:
- A regular research track, where the submissions must be substantially original. For details, see: https://sites.google.com/view/cogalex-2020/home/submissions
- A shared task track, with submissions consisting of system description papers.
For details see :
*4 IMPORTANT DATES*
- Paper submission deadline: September 1, 2020 - Notification of acceptance: October 10, 2020 - Camera-ready papers due: October 25, 2020
- Release of development data : August 1st, 2020 - Release of test data : September 1st, 2020 - Announcement of winners : October 1st, 2020 - Shared task papers due: October20, 2020 **
*5 INVITED SPEAKER*
Alex Arenas (http://deim.urv.cat/~alexandre.arenas/ <http://deim.urv.cat/%7Ealexandre.arenas/>)
Alephsys Lab, Computer Science & Mathematics,
Universidad Rovira i Virgili, 43007 Tarragona, Spain
*6 WORKSHOP ORGANISERS*
- Michael Zock (LIS, CNRS, AMU, Marseille, France) - Alessandro Lenci (Comput. Linguistics Laboratory, University of Pisa, Italy) - Enrico Santus (Bayer, Whippany, NJ, 07981, USA) - Emmanuele Chersoni (Hong Kong Polytechnic University, Hong Kong, China)
*7 PROGRAM COMMITTEE*
see : https://sites.google.com/view/cogalex-2020/home/programme-committee
For general questions, please get in touch with Michael Zock (michael.zock at lis-lab.fr <mailto:michael.zock at lis-lab.fr>).
Concerning the shared task, contact Rong Xiang (csrxiang at comp.polyu.edu.hk <mailto:csrxiang at comp.polyu.edu.hk>)or Emmanuele Chersoni (emmanuelechersoni at gmail.com) <mailto:emmanuelechersoni at gmail.com>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 32788 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200805/2ce91a1c/attachment.txt>