[Corpora-List] Call for participation: SemEval 2015 Task 15: Corpus Pattern Analysis

El Maarouf, Ismail I.El-Maarouf at wlv.ac.uk
Thu Oct 9 16:45:00 CEST 2014

Dear colleagues and members of the Corpora list,

As part of SEMEVAL 2015 competition (http://alt.qcri.org/semeval2015/), we would like to invite you to participate in

TASK 15 - A CPA Dictionary-Entry-Building Task.

This task proposes to gradually get systems to write dictionary entries in the fashion of Corpus Pattern Analysis (see http://pdev.org.uk/#about_cpa) from raw data, by breaking the task into three steps...

A description of the task follows. For more details, as well as for access to the data, please go here: http://alt.qcri.org/semeval2015/task15/ .

Please get in touch if you are interested, and use the following email list for any correspondence: semeval2015task15 at googlegroups.com

Kind regards,

Ismail El Maarouf, on behalf of the task organizers

VÝt Baisa (Masaryk University, Brno, CZ),

Jane Bradbury (University of Wolverhampton, UK),

Isma´l El Maarouf (University of Wolverhampton, UK),

Patrick Hanks (University of Wolverhampton, UK),

Adam Kilgarriff (Lexical Computing Ltd, UK),

Octavian Popescu (FBK, Trento, IT)

*************************************************** * TASK 15 — A CPA Dictionary-Entry-Building Task * ***************************************************

Corpus Pattern Analysis (CPA) is a new technique of language analysis, which identifies the main patterns in which words are used in text.

This task focuses on the current output of CPA (work in progress): the Pattern Dictionary of English Verbs (PDEV), a lexical resource which can be browsed here: http://pdev.org.uk. Contrary to most semantic resources, PDEV starts by analysing corpus data, rather than by speculating about possible meanings; as a general rule, only patterns found in the text samples are listed. Each pattern specifies a contextual environment in which the verb is used. This includes (among other things) the argument structure (subject, object, complement, adverbial) and the semantic type shared by a set of lexical items in each argument slot, taken from a corpus-based shallow semantic ontology (e.g. Human, Computer, Activity, etc.).

The goal of this task is to break down the different levels of analysis required to build a dictionary entry, and to propose each of them as steps that NLP systems can tackle separately.

Three main sub-tasks have been identified:

1. CPA parsing: all sentences in the dataset must be syntactically and semantically analysed.

2. CPA clustering: all sentences in the dataset must be compared and grouped according to their similarities.

3. CPA lexicography: all verbs in the dataset must be described with a list of patterns.

Two distinct datasets are available as training data, with several files for each verb using standard formats. A scorer for each task, as well as the CPA ontology are made available. The verbs to be provided in the test set will be different from the ones provided in the training set.


-- Scanned by iCritical.

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3685 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20141009/fa60e1a9/attachment.txt>

More information about the Corpora mailing list