[Corpora-List] SemEval discussion at NAACL 2019

Valerio Basile basile at di.unito.it
Wed Jun 19 13:46:11 CEST 2019

Hi all,

thank you Ted for bringing up this discussion, to which I also participated in person at SemEval, as one of the task organizers (Task 5). I personally support the idea of a stronger emphasis on the report and the analysis, rather than on the leaderboard. A "best analysis" award, or giving the oral slot to the best described system, or even keeping in the leaderboard only the teams who submitted a report, are all good ideas in this direction.

I would also like to point at an alternative approach, that is, making the tasks *harder*, to encourage the participation of who intends to study semantic phenomena, rather than obtaining a high score only. Here by harder I mean *formally* harder, in terms of structure, while not necessarily as in "AI-hard". From the organization of Task 5 (approx. 70 participating teams), and the results presented for other, more popular tasks, I got the impression that if you throw a text classification-like task at the crowd, the crowd will throw the latest neural network at it, very often with minimal (if any) study of the problem the task aims at modeling, neither before (motivation to use a specific model for a specific phenomenon) nor after (why the model performed how it did). As a result, at the end of a task like that, we have learned several new acronyms, but nothing on the actual problem, whether be it hate speech, offensive language, sentiment, or whatever else. Nothing wrong with neural networks of course, these and related supervised learning paradigms are central to NLP, but perhaps this kind of benchmarking of neural models on datasets is not completely in line with the overall goal of SemEval?

In hindsight, in Task 5 (hate speech detection), I would have liked to ask the systems to predict e.g., "why this message can be considered hateful", "what is the target of hate", or "what hate speech law does this message violate", rather than "HS/not-HS". Understandably, casting tasks as more complex than tweet-level text classification makes them harder to organize them, and probably would lead to far less submissions, so my proposal could be a bit extreme. Maybe the right solution is somewhere in the middle.


On Sun, Jun 16, 2019 at 5:00 PM Ted Pedersen <tpederse at d.umn.edu> wrote:

> Greetings all,
> I posted this to various SemEval lists and Twitter, but was also
> encouraged to send it here (to Corpora). Apologies if you've seen this
> before!
> -----------------
> The SemEval workshop took place during the last two days of NAACL 2019
> in Minneapolis, and included quite a bit of discussion both days about
> the future of SemEval. I enjoyed this conversation (and participated
> in it), so wanted to try and share some of what I think was said.
> A few general concerns were raised about SemEval - one of them is that
> many teams participate without then going on to submit papers
> describing their systems. Related to this is that there are also
> participants who never even really identify themselves to the task
> organizers, and in effect remain anonymous throughout the event. In
> both cases the problem is that in the end SemEval aspires to be an
> academic event where participants describe what they have done in a
> form that can be easily shared with other participants (and papers are
> a good way to do that).
> My own informal estimate is that maybe a half of participating teams
> submit a paper, and then half of those go on to attend the workshop
> and present a poster. So if you see a task with 20 teams, perhaps 10
> of them submit a paper and maybe 5 present a poster. SemEval is
> totally ok with teams that submit a paper but do not attend the
> workshop to present a poster. That has long been the case, and this
> was confirmed again in Minneapolis. The goal then is to get more
> participating teams to submit papers. There was considerable
> discussion on the related issues of why don't more teams submit
> papers, and how can we encourage (or require) the submission of more
> papers?
> One point made is that SemEval participants are sometimes new to our
> community and so don't have a clear idea of what a "system description
> paper" should consist of, and so might not submit papers because they
> believe it will be too difficult or time consuming, or they just don't
> know what to do and fear immediate rejection. There was considerable
> support for the idea of providing a paper template that would help new
> authors know what is expected.
> It was also observed that when teams have disappointing results (not
> top ranked) they might feel like a paper isn't really necessary or
> might even be a bad idea. This tied into a larger discussion about the
> reality that some (many?) participants in SemEval tasks focus on their
> overall ranking and less on understanding the problem that they are
> working on. There was discussion at various points about how to get
> away from the obsession with the leaderboard, and to focus more on
> understanding the problem that is being presented by the task. A
> carefully done analysis of a system that doesn't perform terrifically
> well can shed important light on a problem, while simply describing a
> model and hyperparameter settings that might lead to high scores may
> not be too useful in understanding that same problem.
> One idea was for each task to award a "best analysis paper" and
> potentially award the authors of that paper an oral presentation
> during the workshop. Typically nearly all presentations at SemEval are
> posters, and so the oral slots are somewhat coveted and are often (but
> not always) awarded to the team with the highest rank. Shifting the
> focus of prizes and presentations away from the leaderboard might tend
> to encourage more participants to carry out such analysis and submit
> papers.
> That said, a carefully done analysis paper can be fairly time
> consuming to create and may require more pages than the typical 4 page
> limit. It was suggested that we be more flexible with page limits, so
> that teams could submit fairly minimal descriptions, or go into more
> depth on their systems and analysis. A related idea was to allow
> analysis papers to be submitted to the SemEval year X+1 workshop based
> on system participation in year X. This might be a good option to
> provide since SemEval timelines tend to be pretty tight as it stands.
> Papers sometimes tend to focus more on the horse race or bake off (and
> so analysis is limited to reporting a rank or score in the task).
> However, if scores or rankings were not released until after papers
> were submitted then this could certainly change the nature of such
> papers. In addition, a submitted paper could be made a requirement for
> appearing on the leaderboard.
> There is of course a trade off between increasing participation and
> increasing the number of papers submitted. If papers are made into
> requirements then some teams won't participate. There is perhaps a
> larger question for SemEval to consider, and that is how to increase
> the number of papers without driving away too many participants.
> Another observation that was made was that some teams never identify
> themselves and so participate in the task but are never really
> involved beyond being on the leaderboard. These could of course be
> shadow accounts created by teams who are already participating (to get
> past submission limits?), or they could be accounts created by teams
> who may only want to identify themselves if they end up ranking
> highly. Should anonymous teams be allowed to participate? I don't know
> that there was a clear answer to that question. While anonymous
> participation could be a means to game the system in some way, it
> might also be something done by those who are participating contrary
> to the wishes of an advisor or employer, If teams are reluctant to
> identify themselves for fear of being associated with a "bad" score,
> perhaps it could be possible for teams to remove scores from the
> leaderboard.
> To summarize, I got the sense that there is some interest in both
> increasing the number of papers submitted to SemEval, and also in
> making it clear that there is more to the event than the leaderboard.
> I think there were some great ideas discussed, and I fear I have done
> a somewhat imperfect job of trying to convey those here, but I don't
> want to let the perfect be the enemy of the good enough, so I'm going
> to go ahead and send this around and hope that others who have ideas
> will join in the conversation in some way.
> Cordially,
> Ted
> PS Emily Bender pointed out the following paper overlaps with some of
> the issues mentioned in my summary. I'd strongly encourage all SemEval
> organizers and participants to read through this, very much on target
> and presents some nice ideas about how to think about shared tasks.
> https://aclweb.org/anthology/papers/W/W17/W17-1608/
> ---
> Ted Pedersen
> http://www.d.umn.edu/~tpederse
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 10435 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20190619/e25661f9/attachment.txt>

More information about the Corpora mailing list