[Corpora-List] Fwd: CfP: Concept Extraction Challenge at the 3rd Workshop on Making Sense of Microposts (#MSM2013) @WWW2013 - chance to win $1500

Andrea Varga andrea.job06
Fri Jan 18 00:50:43 CET 2013


apologies for cross-posting ========================================================================== Concept Extraction Challenge @ the 3rd Workshop on Making Sense of Microposts (#MSM2013) at WWW 2013 http://oak.dcs.shef.ac.uk/msm2013/challenge.html 13th May 2013. Rio de Janeiro, Brazil =========================================================================== #MSM2013 will host a 'Concept Extraction Challenge', with a prize sponsored by eBay, where participants must label Microposts in a given dataset with the concepts referenced. Existing concept extraction tools are intended for use over news corpora and similar document-based corpora with relatively long length. The aim of the challenge is to foster research into novel, more accurate concept extraction for (much shorter) Micropost data. The goal of the challenge is to detect concepts contained in Microposts. Concepts are defined as abstract notions of things; for this challenge we are constraining the task to the extraction of entity concepts characterised by an entity type and an entity value. We consider four entity types defined as follows: 1. Person (PER) - references in the Micropost to a full or partial person name. Example: Obama responds to diversity criticism Extracted instances: PER/Obama; 2. Location (LOC) - references in the Micropost to full or partial location names including: cities, provinces or states, countries, continents and (physical) facilities. Example: Finally on the train to London ahhhh Extracted instances: LOC/London; 3. Organisation (ORG) - references in the Micropost to full or partial organisation names including academic, state, governmental, military and business or enterprise organisations. Example: NASA's Donated Spy Telescopes May Aid Dark Energy Search Extracted instances: ORG/NASA; 4. Miscellaneous (MISC) - references in the Micropost to a concept not covered by any of the categories above, but limited to one of the entity types: film/movie, entertainment award event, political event, programming language, sporting event, TV show, nationality, and (spoken or written) language. Example: Okay, now this is getting seriously bizarre. Like a Monty Python script gone wrong. Extracted Instances: MISC/Monty Python; DATASET ----- Two datasets covering a variety of topics of discussion have been provided: one for training and one for testing. The complete dataset (both training and testing data) contains 4265 manually annotated microposts using the above definitions. The dataset is split by 60%/40% for training and testing. Training Dataset ----- A tab-separated data with the following element indices per micropost: - Element 1: The numeric ID of the micropost - Element 2: The concepts found within the micropost, described by an entity type and an entity instance. These are semi-colon separated values (e.g. PER/Obama;ORG/NASA). - Element 3: The content of the micropost - this is what the concepts were detected and extracted from. Test Dataset ----- Also tab-separated data, but unlike the training dataset the concepts have not been extracted: -Element 1: The numeric ID of the micropost -Element 2: The content of the micropost, this is what you must use to detect and extract the concepts contained. Anonymisation and Special Terms ----- To ensure anonymity all username mentions in the microposts have been replaced with '_Mention_', and all URLs with '_URL_'. Data Access ----- The datasets can be downloaded from: http://oak.dcs.shef.ac.uk/msm2013/ie_challenge EVALUATION ------------ In order to evaluate your submissions we require you to submit (along with a paper describing your approach) a tab-separated value (TSV) file with the following format for the microposts in the test dataset: -Element 1: The numeric ID of the micropost. -Element 2: The entity type and entity instance detected in each micropost. These are semi-colon separated values (e.g. PER/Obama;ORG/NASA). For instance, your results would be formatted as: 2560 PER/Obama;ORG/NASA 2561 2562 ORG/FDA; ? This file will be parsed and the accuracy of each approach computed. Accuracy will be judged using the f-measure (with beta = 1 so precision and recall are weighted equally). This will be computed on a per entity-type/entity-instance pair basis and then averaged across the four entity types. We will also provide entity-type specific f-measure values for each team to assess how each approach fares across the different concepts. PRIZE ------------ The best submission to the Micropost Concept Extraction Challenge will receive an award of (US)$1500, generously sponsored by eBay. Information extraction challenges associated with treating eBay items, often of short textual content, are very similar to those used to treat other short textual microposts. By teaming up with eBay to make the challenge possible, the MSM workshop organisers wish to highlight this aspect of the micropost extraction research question. The Challenge Committee will judge submissions based on the outcome of the evaluation procedure described above, and a review of the extended abstracts, to obtain insight into the quality and applicability of the approaches taken. A selection of the submissions accepted will be presented at the challenge. All accepted submissions will be published in a separate CEUR compendium and made available from the workshop website. SUBMISSIONS ------------ Submissions is as a zip file using your system name as the file name (e.g. 'awesomeo9000.zip'), containing: 1. a TSV file with your system name (e.g. 'awesomeo9000.tsv'). 2. an extended abstract of 2 pages describing your approach and how you tuned/tested it using the training split. Written submissions should be prepared according to the ACM SIG Proceedings Template (see http://www.acm.org/sigs/publications/proceedings-templates), and should include author names and affiliations, and 3-5 keywords. Submission is via the EasyChair Conference System, at: https://www.easychair.org/conferences/?conf=msm2013challenge IMPORTANT DATES ---------------- Challenge Data release: 17 Jan 2013 Intent to submit to challenge: 03 Mar 2013 Challenge Submission deadline: 17 Mar 2013 Challenge Notification: 31 Mar 2013 Challenge camera-ready deadline: 07 Apr 2012 (all deadlines 23:59 Hawaii Time) Workshop program issued: 09 Apr 2013 Challenge proceedings to be published via CEUR Workshop - 13 May 2013 (Registration open to all) CONTACT --------------- E-mail: msm2013-0 at easychair.org Facebook Group: http://www.facebook.com/#!/home.php?sk=group_180472611974910 Facebook Public Event page: http://www.facebook.com/events/116134955169543 Twitter hashtag: #msm2013 W3C Microposts Community Group: http://www.w3.org/community/microposts WORKSHOP ORGANISERS --------------------- Matthew Rowe, Lancaster University, UK Milan Stankovic, Université Paris-Sorbonne, France Aba-Sah Dadzie, The University of Sheffield, UK ------------------ Challenge Chair: A. Elizabeth Cano, KMi, The Open University, UK Steering Committee & Local Chair: Bernardo Pereira Nunes, PUC-Rio, Brazil / L3S Research Center, Germany Evaluation Committee: ------------------ Naren Chittar, eBay, USA Peter Mika, Yahoo! Research, Spain Andrea Varga, OAK Group, University of Sheffield, UK -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 7607 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20130117/913e6f03/attachment.txt>



More information about the Corpora mailing list