We build models that work well on our datasets but when we play with them we are surprised that they are brittle and break. Let’s analyze their failings propose new evaluations & models.

@NAACL 2018

website: https://newgeneralization.github.io/

*** NEW *** Deadline for submission: March 16, 2018 *** NEW ***

------------- Overview ------------- Deep learning has brought a wealth of state-of-the-art results and new capabilities. Although methods have achieved near human-level performance on many benchmarks, numerous recent studies imply that these benchmarks only weakly test their intended purpose, and that simple examples produced either by human or machine, cause systems to fail spectacularly. Such surprising failures combined with the inability to interpret state-of-the-art models have eroded confidence in our systems, and while these systems are not perfect, the real flaw lies with our benchmarks that do not adequately measure a model’s ability to generalize, and are thus easily gameable.

This workshop provides a venue for exploring new approaches for measuring and enforcing generalization in models.

We are soliciting submissions in the following areas:

(1) Analysis of existing models and their failings (2) Creation of new evaluation paradigms:

e.g. zero-shot learning, Winnograd schema, and datasets that avoid explicit types of gamification. (3) Modeling advances:

e.g. regularization, compositionality, interpretability, inductive bias, multi-task learning, and other methods that promote generalization.

Accepting archival submissions of length 2 pages for topic (1), archival submissions of length 4 pages for topic (2) and (3) and non-archival cross submissions.

The workshop will include speakers, panel and a poster session for submitted work.

Submission site: https://www.softconf.com/naacl2018/Gen-Deep18/

Deadline for submission: March 16, 2018 Notification of acceptance: April 2, 2018 Camera Ready: April 16, 2018 Workshop date: TBD

