[Corpora-List] HumEval Workshop on Human Evaluation of NLP Systems at EACL'21: Call for Participation

Agarwal, Shubham sa201 at hw.ac.uk
Thu Apr 8 11:48:59 CEST 2021


====================================================== HumEval Workshop on Human Evaluation of NLP Systems at EACL'21 19 April 2021 Online from Kyiv, Ukraine https://humeval.github.io/ ======================================================

CALL FOR PARTICIPATION

Early registration ends: 7 April 2021

Invited Speakers: Margaret Mitchell and Lucia Specia

Open-mic session: For information regarding how to participate in this open discussion session please see in programme below.

Programme<https://humeval.github.io/programme/>

09:00–09:10

Opening

Chair: Anya Belz

09:10–10:00

Invited Talk: Disagreement in Human Evaluation: Blame the Task not the Annotators

by Lucia Specia<https://www.imperial.ac.uk/people/l.specia>, Imperial College London and University of Sheffield

It is well known that human evaluators are prone to disagreement and that this is a problem for reliability and reproducibility of evaluation experiments. The reasons for disagreement can fall into two broad categories: (1) human evaluator, including under-trained, under-incentivised, lacking expertise, or ill-intended individuals, e.g., cheaters; and (2) task, including ill-definition, poor guidelines, suboptimal setup, or inherent subjectivity. While in an ideal evaluation experiment many of these elements will be controlled for, I argue that task subjectivity is a much harder issue. In this talk I will cover a number of evaluation experiments on tasks with variable degrees of subjectivity, discuss their levels of disagreement along with other issues, and cover a few practical approaches do address them. I hope this will lead to an open discussion on possible strategies and directions to alleviate this problem.

10:00–11:00

Oral Session 1 (NLG)

10:00–10:20

It’s Commonsense, isn’t it? Demystifying Human Evaluations in Commonsense-Enhanced NLG systems

Miruna-Adriana Clinciu, Dimitra Gkatzia and Saad Mahamood

10:20–10:40

Estimating Subjective Crowd-Evaluations as an Additional Objective to Improve Natural Language Generation

Jakob Nyberg, Maike Paetzel and Ramesh Manuvinakurike

10:40–11:00

Trading Off Diversity and Quality in Natural Language Generation

Hugh Zhang, Daniel Duckworth, Daphne Ippolito and Arvind Neelakantan

11:00–11:30

Break

11:30–12:10

Oral Session 2 (MT)

11:30–11:50

Towards Document-Level Human MT Evaluation: On the Issues of Annotator Agreement, Effort and Misevaluation

Sheila Castilho

11:50–12:10

Is This Translation Error Critical?: Classification-Based Human and Automatic Machine Translation Evaluation Focusing on Critical Errors

Katsuhito Sudoh, Kosuke Takahashi and Satoshi Nakamura

12:10–13:30

Poster Session

- Towards Objectively Evaluating the Quality of Generated Medical Summaries

Francesco Moramarco, Damir Juric, Aleksandar Savkov and Ehud Reiter

- A Preliminary Study on Evaluating Consultation Notes With Post-Editing

Francesco Moramarco, Alex Papadopoulos Korfiatis, Aleksandar Savkov and Ehud Reiter

- The Great Misalignment Problem in Human Evaluation of NLP Methods

Mika Hämäläinen and Khalid Alnajjar

- A View From the Crowd: Evaluation Challenges for Time-Offset Interaction Applications

Alberto Chierici and Nizar Habash

- Reliability of Human Evaluation for Text Summarization: Lessons Learned and Challenges Ahead

Neslihan Iskender, Tim Polzehl and Sebastian Möller

- On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs

Roman Grundkiewicz, Marcin Junczys-Dowmunt, Christian Federmann and Tom Kocmi

- Eliciting Explicit Knowledge From Domain Experts in Direct Intrinsic Evaluation of Word Embeddings for Specialized Domains

Goya van Boven and Jelke Bloem

- Detecting Post-Edited References and Their Effect on Human Evaluation

Věra Kloudová, Ondřej Bojar and Martin Popel

13:30–15:00

Lunch

15:00–15:40

Oral Session 3

15:00–15:20

A Case Study of Efficacy and Challenges in Practical Human-in-Loop Evaluation of NLP Systems Using Checklist

Shaily Bhatt, Rahul Jain, Sandipan Dandapat and Sunayana Sitaram

15:20–15:40

Interrater Disagreement Resolution: A Systematic Procedure to Reach Consensus in Annotation Tasks

Yvette Oortwijn, Thijs Ossenkoppele and Arianna Betti

15:40–16:40

Open-Mic Discussion Panel

Chair: Ehud Reiter

Discussion session will be open to all participants. Anyone who is interested in speaking for 3 mins about any topic relevant to the workshop should email Ehud Reiter (e.reiter at abdn.ac.uk<mailto:e.reiter at abdn.ac.uk>). We will follow these short presentations by a general discussion.

16:40–17:00

Break

17:00–17:50

Invited Talk: The Ins and Outs of Ethics-Informed Evaluation

by Margaret Mitchell<http://www.m-mitchell.com/>

The modern train/test paradigm in Artificial Intelligence (AI) and Machine Learning (ML) narrows what we can understand about AI models, and skews our understanding of models’ robustness in different environments. In this talk, I will work through the different factors involved in ethics-informed AI evaluation, including connections to ML training and ML fairness, and present an overarching evaluation protocol that addresses a multitude of considerations in developing ethical AI.

17:50–18:00

Closing

Best,

Shubham https://shubhamagarwal92.github.io/

________________________________

Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences. This email is generated from the Heriot-Watt University Group, which includes:

1. Heriot-Watt University, a Scottish charity registered under number SC000278

2. Heriot- Watt Services Limited (Oriam), Scotland's national performance centre for sport. Heriot-Watt Services Limited is a private limited company registered is Scotland with registered number SC271030 and registered office at Research & Enterprise Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.

The contents (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 27484 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20210408/26869736/attachment.txt>



More information about the Corpora mailing list