[Corpora-List] SemEval discussion at NAACL 2019

bergler at cse.concordia.ca bergler at cse.concordia.ca
Wed Jun 19 22:44:36 CEST 2019

What a wonderful exchange!

Continuing Valerio's point, I feel that the time between Task Description and competition is becoming shorter and shorter, some tasks only have a 4 month turnaround time. This is too little for the carefully crafted systems we'd like to encourage (and build). In the past, we sometimes knew about a task being offered for next year's competition. Short turnaround times encourage the off the shelf approaches (plus they put organizers under a lot of pressure). Maybe task descriptions should go out 15 months before competition, training data 12 months before competition, and test data 2 days before competition.

Best, Sabine

Quoting Valerio Basile <basile at di.unito.it>:

> Hi all,
> thank you Ted for bringing up this discussion, to which I also
> participated in person at SemEval, as one of the task organizers (Task 5).
> I personally support the idea of a stronger emphasis on the report and the
> analysis, rather than on the leaderboard. A "best analysis" award, or
> giving the oral slot to the best described system, or even keeping in the
> leaderboard only the teams who submitted a report, are all good ideas in
> this direction.
> I would also like to point at an alternative approach, that is, making the
> tasks *harder*, to encourage the participation of who intends to study
> semantic phenomena, rather than obtaining a high score only. Here by harder
> I mean *formally* harder, in terms of structure, while not necessarily as
> in "AI-hard". From the organization of Task 5 (approx. 70 participating
> teams), and the results presented for other, more popular tasks, I got the
> impression that if you throw a text classification-like task at the crowd,
> the crowd will throw the latest neural network at it, very often with
> minimal (if any) study of the problem the task aims at modeling, neither
> before (motivation to use a specific model for a specific phenomenon) nor
> after (why the model performed how it did). As a result, at the end of a
> task like that, we have learned several new acronyms, but nothing on the
> actual problem, whether be it hate speech, offensive language, sentiment,
> or whatever else. Nothing wrong with neural networks of course, these and
> related supervised learning paradigms are central to NLP, but perhaps this
> kind of benchmarking of neural models on datasets is not completely in line
> with the overall goal of SemEval?
> In hindsight, in Task 5 (hate speech detection), I would have liked to ask
> the systems to predict e.g., "why this message can be considered hateful",
> "what is the target of hate", or "what hate speech law does this message
> violate", rather than "HS/not-HS".
> Understandably, casting tasks as more complex than tweet-level text
> classification makes them harder to organize them, and probably would lead
> to far less submissions, so my proposal could be a bit extreme. Maybe the
> right solution is somewhere in the middle.
> Valerio
> On Sun, Jun 16, 2019 at 5:00 PM Ted Pedersen <tpederse at d.umn.edu> wrote:
>> Greetings all,
>> I posted this to various SemEval lists and Twitter, but was also
>> encouraged to send it here (to Corpora). Apologies if you've seen this
>> before!
>> -----------------
>> The SemEval workshop took place during the last two days of NAACL 2019
>> in Minneapolis, and included quite a bit of discussion both days about
>> the future of SemEval. I enjoyed this conversation (and participated
>> in it), so wanted to try and share some of what I think was said.
>> A few general concerns were raised about SemEval - one of them is that
>> many teams participate without then going on to submit papers
>> describing their systems. Related to this is that there are also
>> participants who never even really identify themselves to the task
>> organizers, and in effect remain anonymous throughout the event. In
>> both cases the problem is that in the end SemEval aspires to be an
>> academic event where participants describe what they have done in a
>> form that can be easily shared with other participants (and papers are
>> a good way to do that).
>> My own informal estimate is that maybe a half of participating teams
>> submit a paper, and then half of those go on to attend the workshop
>> and present a poster. So if you see a task with 20 teams, perhaps 10
>> of them submit a paper and maybe 5 present a poster. SemEval is
>> totally ok with teams that submit a paper but do not attend the
>> workshop to present a poster. That has long been the case, and this
>> was confirmed again in Minneapolis. The goal then is to get more
>> participating teams to submit papers. There was considerable
>> discussion on the related issues of why don't more teams submit
>> papers, and how can we encourage (or require) the submission of more
>> papers?
>> One point made is that SemEval participants are sometimes new to our
>> community and so don't have a clear idea of what a "system description
>> paper" should consist of, and so might not submit papers because they
>> believe it will be too difficult or time consuming, or they just don't
>> know what to do and fear immediate rejection. There was considerable
>> support for the idea of providing a paper template that would help new
>> authors know what is expected.
>> It was also observed that when teams have disappointing results (not
>> top ranked) they might feel like a paper isn't really necessary or
>> might even be a bad idea. This tied into a larger discussion about the
>> reality that some (many?) participants in SemEval tasks focus on their
>> overall ranking and less on understanding the problem that they are
>> working on. There was discussion at various points about how to get
>> away from the obsession with the leaderboard, and to focus more on
>> understanding the problem that is being presented by the task. A
>> carefully done analysis of a system that doesn't perform terrifically
>> well can shed important light on a problem, while simply describing a
>> model and hyperparameter settings that might lead to high scores may
>> not be too useful in understanding that same problem.
>> One idea was for each task to award a "best analysis paper" and
>> potentially award the authors of that paper an oral presentation
>> during the workshop. Typically nearly all presentations at SemEval are
>> posters, and so the oral slots are somewhat coveted and are often (but
>> not always) awarded to the team with the highest rank. Shifting the
>> focus of prizes and presentations away from the leaderboard might tend
>> to encourage more participants to carry out such analysis and submit
>> papers.
>> That said, a carefully done analysis paper can be fairly time
>> consuming to create and may require more pages than the typical 4 page
>> limit. It was suggested that we be more flexible with page limits, so
>> that teams could submit fairly minimal descriptions, or go into more
>> depth on their systems and analysis. A related idea was to allow
>> analysis papers to be submitted to the SemEval year X+1 workshop based
>> on system participation in year X. This might be a good option to
>> provide since SemEval timelines tend to be pretty tight as it stands.
>> Papers sometimes tend to focus more on the horse race or bake off (and
>> so analysis is limited to reporting a rank or score in the task).
>> However, if scores or rankings were not released until after papers
>> were submitted then this could certainly change the nature of such
>> papers. In addition, a submitted paper could be made a requirement for
>> appearing on the leaderboard.
>> There is of course a trade off between increasing participation and
>> increasing the number of papers submitted. If papers are made into
>> requirements then some teams won't participate. There is perhaps a
>> larger question for SemEval to consider, and that is how to increase
>> the number of papers without driving away too many participants.
>> Another observation that was made was that some teams never identify
>> themselves and so participate in the task but are never really
>> involved beyond being on the leaderboard. These could of course be
>> shadow accounts created by teams who are already participating (to get
>> past submission limits?), or they could be accounts created by teams
>> who may only want to identify themselves if they end up ranking
>> highly. Should anonymous teams be allowed to participate? I don't know
>> that there was a clear answer to that question. While anonymous
>> participation could be a means to game the system in some way, it
>> might also be something done by those who are participating contrary
>> to the wishes of an advisor or employer, If teams are reluctant to
>> identify themselves for fear of being associated with a "bad" score,
>> perhaps it could be possible for teams to remove scores from the
>> leaderboard.
>> To summarize, I got the sense that there is some interest in both
>> increasing the number of papers submitted to SemEval, and also in
>> making it clear that there is more to the event than the leaderboard.
>> I think there were some great ideas discussed, and I fear I have done
>> a somewhat imperfect job of trying to convey those here, but I don't
>> want to let the perfect be the enemy of the good enough, so I'm going
>> to go ahead and send this around and hope that others who have ideas
>> will join in the conversation in some way.
>> Cordially,
>> Ted
>> PS Emily Bender pointed out the following paper overlaps with some of
>> the issues mentioned in my summary. I'd strongly encourage all SemEval
>> organizers and participants to read through this, very much on target
>> and presents some nice ideas about how to think about shared tasks.
>> https://aclweb.org/anthology/papers/W/W17/W17-1608/
>> ---
>> Ted Pedersen
>> http://www.d.umn.edu/~tpederse
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> https://mailman.uib.no/listinfo/corpora

More information about the Corpora mailing list