[Corpora-List] Corpora Digest, Vol 121, Issue 16

ipek baris ipekbrs at gmail.com
Wed Jul 12 10:06:46 CEST 2017


Dear all,

I am looking for free datasets for IR evaluation (like TREC dataset) and I also need annotated entity linking disambiguation dataset which has DBpedia, Wikipedia or wordnet. I have already trıed AIDA/Conll datasets but ıs not suffıcıent for optımızıng parameters related to disambiguation task.

I will be highly appreciated for any help. Thank you for time.

Bests,

Ipek Baris.

11 Tem 2017 13:00 tarihinde <corpora-request at uib.no> yazdı:


> Today's Topics:
>
> 1. 2nd CfP: IJCNLP 2017 Shared Task - Dimensional Sentiment
> Analysis for Chinese Phrases (Lung-Hao Lee)
> 2. 2nd CfP: IJCNLP 2017 Workshop on Natural Language Processing
> Techniques for Educational Applications (NLPTEA 2017) (Lung-Hao Lee)
> 3. EXTENDED DEADLINE: LT4DH-CEE Workshop at RANLP 2017
> (Petya Osenova)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 11 Jul 2017 12:01:34 +0800
> From: Lung-Hao Lee <lunghaolee at gmail.com>
> Subject: [Corpora-List] 2nd CfP: IJCNLP 2017 Shared Task - Dimensional
> Sentiment Analysis for Chinese Phrases
> To: corpora <corpora at uib.no>
> Cc: Kam-Fai Wong <kfwong at se.cuhk.edu.hk>, Lung-Hao Lee
> <lhlee at ntnu.edu.tw>, Jin Wang <wangjin at ynu.edu.cn>, Liang-Chih
> Yu
> <lcyu at saturn.yzu.edu.tw>
>
> ------------------------------------------------------------
> ----------------------------------------------
> The 8th International Joint Conference on Natural Language Processing (
> *IJCNLP** 2017*)
> November 27- December 1, 2017 at Tainan, Taiwan
> *http://ijcnlp2017.org/ <http://ijcnlp2017.org/>*
> ------------------------------------------------------------
> ----------------------------------------------
> (With apologies for cross-posting)
>
> *Call for Participation*
>
> *IJCNLP 2017 Shared Task:*
> *Dimensional Sentiment Analysis for Chinese Phrases*
> *http://nlp.innobic.yzu.edu.tw/tasks/dsa_p/
> <http://nlp.innobic.yzu.edu.tw/tasks/dsa_p/>*
>
> Sentiment lexicons with valence-arousal ratings are useful resources for
> the development of dimensional sentiment applications. Due to the limited
> availability of such VA lexicons, especially for Chinese, the objective of
> the task is to automatically acquire the valence-arousal ratings of Chinese
> affective words and phrases.
>
> Given a word or phrase, participants are asked to provide a real-valued
> score from 1 to 9 for both valence and arousal dimensions, indicating the
> degree from most negative to most positive for valence, and from most calm
> to most excited for arousal. The input format is ?term_id, term?, and the
> output format is ?term_id, valence_rating, arousal_rating?. Below are the
> input/output formats of the example words ? (good), ??? (very good), ??
> (satisfy), and ??? (not satisfy).
>
>
>
> - *Example 1*:
> Input: 1, ?
> Output: 1, 6.8, 5.2
> - *Example 2*:
> Input: 2, ???
> Output: 2, 8.500, 6.625
> - *Example 3*:
> Input: 3, ??
> Output: 3, 7.2, 5.6
> - *Example 4*:
> Input: 4, ???
> Output: 4, 2.813, 5.688
>
> *Data *
>
> - Training Set:
> - For words: 2,802 single words annotated with valence-arousal
> ratings (CVAW 2.0) (Yu et al., 2016a).
> - For phrases: 2,250 multi-word phrases annotated with
> valence-arousal ratings
> - Test set:
> - 750 single words and 750 multi-word phrases. The policy of this
> shared task is an open test. Participating systems are allowed
> to use other
> publicly available data for this shared task, but the use of other
> data
> should be specified in the final technical report.
>
> *Evaluation*
>
> The performance is evaluated by examining the difference between
> machine-predicted ratings and human-annotated ratings (valence and arousal
> are treated independently). The evaluation metrics include:
>
> - Mean absolute error
> - Pearson correlation coefficient
>
> *Registration*
>
> Participants need to register in order to obtain the training and test
> data. To register, please send the following information to Lung-Hao Lee (
> lhlee at ntnu.edu.tw).
>
> - Team Name
> - Organization of your team
> - Name and E-mail address of contact person for your team
>
> *Important Dates*
>
> - Registration open: May 15, 2017
> - Release of training data: May 15, 2017
> - Registration close: August 11, 2017
> - Release of test data: August 14, 2017
> - Testing results submission due: August 21, 2017
> - Release of evaluation results: August 31, 2017
> - System description paper due: September 15, 2017
> - Notification of acceptance: September 30, 2017
> - Camera-ready deadline: October 10, 2017
> - Shared task date: December 1, 2017
>
> *Organizers*
>
> - Liang-Chih Yu (Yuan Ze University)
> - Lung-Hao Lee (National Taiwan Normal University)
> - Jin Wang (Yunnan University)
> - Kam-Fai Wong (The Chinese University of Hong Kong)
>
>
> --
> Lung-Hao Lee (???), Ph.D.
> Postdoctoral Fellow & Adjunct Assistant Professor
> Graduate Institute of Library and Information Studies
> National Taiwan Normal University
> Email: lhlee at ntnu.edu.tw
> Web: http://web.ntnu.edu.tw/~lhlee/
> --
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 8586 bytes
> Desc: not available
> URL: <https://mailman.uib.no/public/corpora/attachments/
> 20170711/1e905d08/attachment.txt>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 11 Jul 2017 12:03:51 +0800
> From: Lung-Hao Lee <lunghaolee at gmail.com>
> Subject: [Corpora-List] 2nd CfP: IJCNLP 2017 Workshop on Natural
> Language Processing Techniques for Educational Applications (NLPTEA
> 2017)
> To: corpora <corpora at uib.no>
> Cc: pcfung at se.cuhk.edu.hk, Lung-Hao Lee <lhlee at ntnu.edu.tw>,
> Liang-Chih Yu <lcyu at saturn.yzu.edu.tw>, Yuen-Hsien Tseng
> <samtseng at ntnu.edu.tw>, Hsin-Hsi Chen <hhchen at ntu.edu.tw>
>
> ------------------------------------------------------------
> ---------------------------------------------------
>
> The 4th Workshop on Natural Language Processing Techniques for Educational
> Applications (*NLPTEA 2017*) with a Shared Task for Chinese Spelling Check
> (
> *CSC*)
>
> December 1st, 2017 at Taipei, Taiwan (in conjunction with *IJCNLP 2017*)
>
>
> NLPTEA 2017: https://sites.google.com/view/nlptea2017
>
> CSC Shared Task: https://www.labviso.com/nlptea2017/
>
> ------------------------------------------------------------
> ---------------------------------------------------
>
> The aim of this workshop is to provide a forum where international
> participants can share knowledge on the computer-assisted language
> learning. For the past decade, research and development in the
> computational linguistics community has advanced NLP techniques for
> educational applications. For example, a series of workshops on Innovative
> Use of NLP for Building Educational Applications (BEA) were organized to
> improve existing capabilities and to generate creative ways to use NLP in
> educational applications for writing, reading, assessment, and so on.
> Besides, a number of competitive tasks have been organized to encourage
> innovation in automated grammatical error detection and correction. For
> examples, the CoNLL 2013/2014 shared tasks aimed to correct grammatical
> errors among learners of English as a foreign language in the educational
> application. All of these workshops/competitions will increase the
> visibility of educational application research in the NLP community.
>
> However, the workshops and shared tasks mentioned above predominantly focus
> on English language learning. Unlike the English learning setting for which
> many learning technologies have been developed, learning tools to support
> Asian language learners are relatively rare. In response, we organized the
> first NLPTEA workshop in conjunction with the 22nd International Conference
> on Computers in Education (ICCE 2014 in Nara, Japan), which is the flagship
> conference in computer education area. The 2nd NLPTEA workshop was held in
> conjunction with ACL-IJCNLP 2015 (Beijing, China). The 3rd NLPTEA workshop
> was organized in COLING 2016 (Osaka, Japan).
>
> The NLPTEA is the annual workshop for the Special Interest Group on
> Computer-Assisted Language Learning (SIGCALL) of the Association for
> Computational Linguistics and Chinese Language Processing (ACLCLP). The
> purpose of NLPTEA 2017 is to identify challenging problems facing the
> development of computer-assisted techniques for Asian language learning,
> and to shape future research directions through the publication of applied
> and theoretical research findings. To better meet this end, this year, we
> will have a shared task on Chinese Spelling Check.
>
> - *Annotation schema and error tagging*
> - *Assessment of learners? language proficiency*
> - *Automated essay scoring*
> - *Collaborative learning environments*
> - *Content analysis for assessment*
> - *Discourse and stylistic analysis*
> - *Educational data mining*
> - *E-learning tools for personalized course content*
> - *Grammatical error detection and correction*
> - *Intelligent tutoring systems*
> - *Learner corpus development and evaluation*
> - *Native language identification*
> - *NLP tools for language learning*
> - *Plagiarism detection*
> - *Second language acquisition*
> - *Sentence judgment system*
> - *Spelling error checking*
> - *Structural analysis of argumentation*
> - *Tools and applications for teachers, courseware, tests, and students*
>
> We invite authors to submit the following 2 types of papers:
>
> - *Full papers* (8 pages, plus 2 extra pages for references) that report
> solid and completed work with new experiments, findings and/or
> approaches
> - *Short papers* (4 pages, plus 2 extra pages for references) that
> report
> a small, focused contribution, work in progress, a negative result, an
> interesting application nugget
>
> Accepted papers will be presented orally or as posters. The decision as to
> which papers will be presented orally and which as posters will be made by
> the program committee based on the nature rather than on the quality of the
> work.
>
> *Submission Format*:
>
> Submissions must be in PDF, and must conform to the official style
> guidelines in two-column format for IJCNLP 2017. We ask you to use the
> provided LaTeX style files or Word template. Authors are strongly
> discouraged from modifying the style files. Submissions that do not conform
> to the required styles, including paper size, margin width, and font size
> restrictions, will be rejected without review.
>
> *Submission Guidelines*:
>
> Submitted papers should be substantially original and unpublished. The
> reviewing process will be double-blind. Therefore papers must not include
> authors' names and affiliations. Furthermore, self-references that reveal
> the authors' identity, e.g., "We previously showed (Smith, 1991) ..." must
> be avoided. Instead, use citations such as "Smith previously showed (Smith,
> 1991) .." Papers that do not conform to these requirements will be rejected
> without review. In addition, please do not post your submissions on the web
> until after the review process is complete.
>
> *Important Dates*:
>
> - *Paper submission deadline*: *September 5, 2017* (23:59 UTC-7)
> - Notification of acceptance: September 30, 2017
> - Camera-ready deadline: October 10, 2017
> - Workshop date: November 27 or December 1, 2017
>
> *Workshop Organizers*
>
> - Yuen-Hsien Tseng, National Taiwan Normal University
> - Hsin-Hsi Chen, National Taiwan University
> - Lung-Hao Lee, National Taiwan Normal University
> - Liang-Chih Yu, Yuan Ze University
>
> *Shared Task Organizers*
>
> - Gabriel Pui Cheong Fung, The Chinese University of Hong Kong
> - Jia Zhu, South China Normal University
>
>
> --
> Lung-Hao Lee (???), Ph.D.
> Postdoctoral Fellow & Adjunct Assistant Professor
> Graduate Institute of Library and Information Studies
> National Taiwan Normal University
> Email: lhlee at ntnu.edu.tw
> Web: http://web.ntnu.edu.tw/~lhlee/
> --
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 9533 bytes
> Desc: not available
> URL: <https://mailman.uib.no/public/corpora/attachments/
> 20170711/45e1b71e/attachment.txt>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 11 Jul 2017 08:17:48 +0000
> From: Petya Osenova <petyaosenova at hotmail.com>
> Subject: [Corpora-List] EXTENDED DEADLINE: LT4DH-CEE Workshop at RANLP
> 2017
> To: "corpora at uib.no" <corpora at uib.no>
>
> First Workshop on Language Technology for Digital Humanities in Central
> and (South-) Eastern Europe (LT4DH-CEE)
>
> in conjunction with the 11th biennial Recent Advances in Natural Language
> Processing conference (RANLP 2017) http://lml.bas.bg/ranlp2017/start.php
> which will take place in September 4-8, 2017, in Varna, Bulgaria.
>
> DEADLINE EXTENDED 23.07.2017
>
>
>
> Motivation
>
>
>
> During the last decades Digital Humanities evolved dramatically, from
> simple database applications to complex systems involving most recent state
> of the art in Computer Science. Especially Language Technology plays a
> major role either for processing the metadata of recorded objects or for
> analyzing and interpreting content.
>
>
>
> Applying language technology methods to objects from humanities is a
> challenge for NLP-research: data is heterogeneous (image /text), often
> incomplete (e.g. OCR errors), multilingual within one document (historic
> documents with Latin or /and classical Greek paragraphs) and difficult to
> structure (paragraphs, titles, pages are somewhat different in historical
> texts).
>
>
>
> Corpus-based methods, nowadays standard in NLP research cannot be often
> applied as the necessary large training data is missing.
>
> Moreover requirements of tools for digital humanities, especially such
> tools dedicated to cultural heritage objects are different from those for
> tools applied to modern texts.
>
>
>
> Thus performing research in Digital Humanities involves also adapting
> existent NLP Tools for historical variants of languages, developing tools
> for new languages, making tools robust for syntactic deviation and adapting
> semantic resources.
>
> Central and Eastern Europe was always characterized by a high
> concentration of languages and cultures. Unfortunately, especially here
> many historical documents are in bad condition; many languages or dialects
> became extinct over the time and their written evidence is rare.
>
>
>
> Digital Humanities seems the perfect means for preservation and
> investigation of this rich cultural heritage asset. However, up to now,
> dedicated activities seem to miss, probably also due to the lack of
> adequate NLP resources and tools. Thus it is imperiously necessary to
> evaluate existent technology, monitor current activities, network research
> teams in this area, all aims of proposed workshop.
>
>
>
> Topics
>
>
>
> We are looking for original unpublished work related (but not limited to)
> one of the following topics:
>
>
>
> - Corpora for diachronic variants and the dialects of languages in Central
> and Eastern Europe (CEE) ;
>
> - NLP Tools for documents of historic, political, philosophical,
> archeological content in CEE;
>
> - Digital Humanities applications related to CEE;
>
> - Evaluation of current frameworks (CLARIN, DARIAH) on DH-objects related
> to CEE;
>
> - DH objects as Linked Open Data sets in CEE;
>
> - DH types of resources in CEE (texts, images, artefacts, multimodal
> objects, etc.);
>
> - Problematic issues related to tracking, digitizing, processing,
> annotating and preserving the DH objects in CEE;
>
> - Good practices for handling under-resourced DH objects.
>
>
>
> Submissions
>
>
>
> Please submit your paper through the START system at
>
>
>
> https://www.softconf.com/ranlp2017/LTDHCSEE/
>
>
>
> The reviewing process is anonymous. Double submission is allowed, but
> authors will be asked to declare it at the time of submission.
>
>
>
> Long papers should be 8 pages long plus 2 extra pages for references.
>
> Short papers should be 6 pages long plus 2 extra pages for references.
> Accepted short papers will be presented either as short oral presentations
> or as posters.
>
>
>
> All submissions should be formatted using the ACL based stylesheets
> provided for RANLP (http://lml.bas.bg/ranlp2017/submissions.php#styles).
>
> Accepted papers will be published in the workshop proceedings and uploaded
> on the ACL Anthology.
>
>
>
> Important Dates:
>
>
> Paper submission deadline: July 23, 2017 (extended)
> Notification of acceptance: August 11, 2017
> Camera-ready papers due: August 25, 2017
> LT4DH-CEE Workshop: September 8, 2017
>
>
>
>
>
> Organizing Committee
>
>
>
> Anca Dinu, University of Bucharest, Romania
>
> Petya Osenova, Bulgarian Academy of Sciences, Bulgaria
>
> Cristina Vertan, University of Hamburg, Germany
>
>
>
> Programme Committee
>
>
>
> Liviu Dinu, University of Bucharest
>
> Antske Fokkens, Vrije Universiteit Amsterdam
>
> Walther v. Hahn, University of Hamburg
>
> Vladislav Kubon, Charles University Prague
>
> Maciej Ogrodniczuk, Polish Academy of Sciences
>
> Gabor, Proszeky, Catholic university Budapest,
>
> Kiril Simov, Bulgarian Academy of Sciences
>
> Stefan Trausan, Politechnics University Bucharest
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 16270 bytes
> Desc: not available
> URL: <https://mailman.uib.no/public/corpora/attachments/
> 20170711/a6829fc8/attachment.txt>
>
> ----------------------------------------------------------------------
> Send Corpora mailing list submissions to
> corpora at uib.no
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mailman.uib.no/listinfo/corpora
> or, via email, send a message with subject or body 'help' to
> corpora-request at uib.no
>
> You can reach the person managing the list at
> corpora-owner at uib.no
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Corpora digest..."
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
> End of Corpora Digest, Vol 121, Issue 16
> ****************************************
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 23108 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170712/4a552c56/attachment.txt>



More information about the Corpora mailing list